Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lib++][Format] Updates Unicode database. #125712

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

mordante
Copy link
Member

@mordante mordante commented Feb 4, 2025

Updates the databease to the Unicode release 16.0.0. The algorithms of the Grapheme clustering rules have not changed.

@mordante mordante requested a review from a team as a code owner February 4, 2025 16:25
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Feb 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 4, 2025

@llvm/pr-subscribers-libcxx

Author: Mark de Wever (mordante)

Changes

Updates the databease to the Unicode release 16.0.0. The algorithms of the Grapheme clustering rules have not changed.


Patch is 591.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125712.diff

12 Files Affected:

  • (modified) libcxx/docs/ReleaseNotes/21.rst (+1)
  • (modified) libcxx/include/__format/escaped_output_table.h (+53-29)
  • (modified) libcxx/include/__format/extended_grapheme_cluster_table.h (+52-47)
  • (modified) libcxx/include/__format/indic_conjunct_break_table.h (+257-55)
  • (modified) libcxx/include/__format/width_estimation_table.h (+11-8)
  • (modified) libcxx/test/libcxx/utilities/format/format.string/format.string.std/extended_grapheme_cluster.h (+531-813)
  • (modified) libcxx/utils/data/unicode/DerivedCoreProperties.txt (+682-152)
  • (modified) libcxx/utils/data/unicode/DerivedGeneralCategory.txt (+148-58)
  • (modified) libcxx/utils/data/unicode/EastAsianWidth.txt (+90-25)
  • (modified) libcxx/utils/data/unicode/GraphemeBreakProperty.txt (+62-34)
  • (modified) libcxx/utils/data/unicode/GraphemeBreakTest.txt (+185-279)
  • (modified) libcxx/utils/data/unicode/emoji-data.txt (+31-11)
diff --git a/libcxx/docs/ReleaseNotes/21.rst b/libcxx/docs/ReleaseNotes/21.rst
index 82f1de6bad3942..24393607970238 100644
--- a/libcxx/docs/ReleaseNotes/21.rst
+++ b/libcxx/docs/ReleaseNotes/21.rst
@@ -46,6 +46,7 @@ Improvements and New Features
 - The ``std::ranges::{copy, copy_n, copy_backward}`` algorithms have been optimized for ``std::vector<bool>::iterator``\s,
   resulting in a performance improvement of up to 2000x.
 
+- Updated formatting library to Unicode 16.0.0.
 
 Deprecations and Removals
 -------------------------
diff --git a/libcxx/include/__format/escaped_output_table.h b/libcxx/include/__format/escaped_output_table.h
index 7a0b35239861e0..1401b4637d8396 100644
--- a/libcxx/include/__format/escaped_output_table.h
+++ b/libcxx/include/__format/escaped_output_table.h
@@ -109,7 +109,7 @@ namespace __escaped_output_table {
 /// - bits [14, 31] The lower bound code point of the range. The upper bound of
 ///   the range is lower bound + size. Note the code expects code units the fit
 ///   into 18 bits, instead of the 21 bits needed for the full Unicode range.
-_LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
+_LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[735] = {
     0x00000020 /* 00000000 - 00000020 [   33] */,
     0x001fc021 /* 0000007f - 000000a0 [   34] */,
     0x002b4000 /* 000000ad - 000000ad [    1] */,
@@ -136,7 +136,7 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x02170001 /* 0000085c - 0000085d [    2] */,
     0x0217c000 /* 0000085f - 0000085f [    1] */,
     0x021ac004 /* 0000086b - 0000086f [    5] */,
-    0x0223c008 /* 0000088f - 00000897 [    9] */,
+    0x0223c007 /* 0000088f - 00000896 [    8] */,
     0x02388000 /* 000008e2 - 000008e2 [    1] */,
     0x02610000 /* 00000984 - 00000984 [    1] */,
     0x02634001 /* 0000098d - 0000098e [    2] */,
@@ -331,12 +331,11 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x06a68005 /* 00001a9a - 00001a9f [    6] */,
     0x06ab8001 /* 00001aae - 00001aaf [    2] */,
     0x06b3c030 /* 00001acf - 00001aff [   49] */,
-    0x06d34002 /* 00001b4d - 00001b4f [    3] */,
-    0x06dfc000 /* 00001b7f - 00001b7f [    1] */,
+    0x06d34000 /* 00001b4d - 00001b4d [    1] */,
     0x06fd0007 /* 00001bf4 - 00001bfb [    8] */,
     0x070e0002 /* 00001c38 - 00001c3a [    3] */,
     0x07128002 /* 00001c4a - 00001c4c [    3] */,
-    0x07224006 /* 00001c89 - 00001c8f [    7] */,
+    0x0722c004 /* 00001c8b - 00001c8f [    5] */,
     0x072ec001 /* 00001cbb - 00001cbc [    2] */,
     0x07320007 /* 00001cc8 - 00001ccf [    8] */,
     0x073ec004 /* 00001cfb - 00001cff [    5] */,
@@ -364,7 +363,7 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x0830400e /* 000020c1 - 000020cf [   15] */,
     0x083c400e /* 000020f1 - 000020ff [   15] */,
     0x08630003 /* 0000218c - 0000218f [    4] */,
-    0x0909c018 /* 00002427 - 0000243f [   25] */,
+    0x090a8015 /* 0000242a - 0000243f [   22] */,
     0x0912c014 /* 0000244b - 0000245f [   21] */,
     0x0add0001 /* 00002b74 - 00002b75 [    2] */,
     0x0ae58000 /* 00002b96 - 00002b96 [    1] */,
@@ -393,16 +392,16 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x0c400004 /* 00003100 - 00003104 [    5] */,
     0x0c4c0000 /* 00003130 - 00003130 [    1] */,
     0x0c63c000 /* 0000318f - 0000318f [    1] */,
-    0x0c79000a /* 000031e4 - 000031ee [   11] */,
+    0x0c798008 /* 000031e6 - 000031ee [    9] */,
     0x0c87c000 /* 0000321f - 0000321f [    1] */,
     0x29234002 /* 0000a48d - 0000a48f [    3] */,
     0x2931c008 /* 0000a4c7 - 0000a4cf [    9] */,
     0x298b0013 /* 0000a62c - 0000a63f [   20] */,
     0x29be0007 /* 0000a6f8 - 0000a6ff [    8] */,
-    0x29f2c004 /* 0000a7cb - 0000a7cf [    5] */,
+    0x29f38001 /* 0000a7ce - 0000a7cf [    2] */,
     0x29f48000 /* 0000a7d2 - 0000a7d2 [    1] */,
     0x29f50000 /* 0000a7d4 - 0000a7d4 [    1] */,
-    0x29f68017 /* 0000a7da - 0000a7f1 [   24] */,
+    0x29f74014 /* 0000a7dd - 0000a7f1 [   21] */,
     0x2a0b4002 /* 0000a82d - 0000a82f [    3] */,
     0x2a0e8005 /* 0000a83a - 0000a83f [    6] */,
     0x2a1e0007 /* 0000a878 - 0000a87f [    8] */,
@@ -491,7 +490,8 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x41688000 /* 000105a2 - 000105a2 [    1] */,
     0x416c8000 /* 000105b2 - 000105b2 [    1] */,
     0x416e8000 /* 000105ba - 000105ba [    1] */,
-    0x416f4042 /* 000105bd - 000105ff [   67] */,
+    0x416f4002 /* 000105bd - 000105bf [    3] */,
+    0x417d000b /* 000105f4 - 000105ff [   12] */,
     0x41cdc008 /* 00010737 - 0001073f [    9] */,
     0x41d58009 /* 00010756 - 0001075f [   10] */,
     0x41da0017 /* 00010768 - 0001077f [   24] */,
@@ -534,11 +534,15 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x432cc00c /* 00010cb3 - 00010cbf [   13] */,
     0x433cc006 /* 00010cf3 - 00010cf9 [    7] */,
     0x434a0007 /* 00010d28 - 00010d2f [    8] */,
-    0x434e8125 /* 00010d3a - 00010e5f [  294] */,
+    0x434e8005 /* 00010d3a - 00010d3f [    6] */,
+    0x43598002 /* 00010d66 - 00010d68 [    3] */,
+    0x43618007 /* 00010d86 - 00010d8d [    8] */,
+    0x436400cf /* 00010d90 - 00010e5f [  208] */,
     0x439fc000 /* 00010e7f - 00010e7f [    1] */,
     0x43aa8000 /* 00010eaa - 00010eaa [    1] */,
     0x43ab8001 /* 00010eae - 00010eaf [    2] */,
-    0x43ac804a /* 00010eb2 - 00010efc [   75] */,
+    0x43ac800f /* 00010eb2 - 00010ec1 [   16] */,
+    0x43b14036 /* 00010ec5 - 00010efb [   55] */,
     0x43ca0007 /* 00010f28 - 00010f2f [    8] */,
     0x43d68015 /* 00010f5a - 00010f6f [   22] */,
     0x43e28025 /* 00010f8a - 00010faf [   38] */,
@@ -578,7 +582,18 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x44d60004 /* 00011358 - 0001135c [    5] */,
     0x44d90001 /* 00011364 - 00011365 [    2] */,
     0x44db4002 /* 0001136d - 0001136f [    3] */,
-    0x44dd408a /* 00011375 - 000113ff [  139] */,
+    0x44dd400a /* 00011375 - 0001137f [   11] */,
+    0x44e28000 /* 0001138a - 0001138a [    1] */,
+    0x44e30001 /* 0001138c - 0001138d [    2] */,
+    0x44e3c000 /* 0001138f - 0001138f [    1] */,
+    0x44ed8000 /* 000113b6 - 000113b6 [    1] */,
+    0x44f04000 /* 000113c1 - 000113c1 [    1] */,
+    0x44f0c001 /* 000113c3 - 000113c4 [    2] */,
+    0x44f18000 /* 000113c6 - 000113c6 [    1] */,
+    0x44f2c000 /* 000113cb - 000113cb [    1] */,
+    0x44f58000 /* 000113d6 - 000113d6 [    1] */,
+    0x44f64007 /* 000113d9 - 000113e0 [    8] */,
+    0x44f8c01c /* 000113e3 - 000113ff [   29] */,
     0x45170000 /* 0001145c - 0001145c [    1] */,
     0x4518801d /* 00011462 - 0001147f [   30] */,
     0x45320007 /* 000114c8 - 000114cf [    8] */,
@@ -589,7 +604,8 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x45968005 /* 0001165a - 0001165f [    6] */,
     0x459b4012 /* 0001166d - 0001167f [   19] */,
     0x45ae8005 /* 000116ba - 000116bf [    6] */,
-    0x45b28035 /* 000116ca - 000116ff [   54] */,
+    0x45b28005 /* 000116ca - 000116cf [    6] */,
+    0x45b9001b /* 000116e4 - 000116ff [   28] */,
     0x45c6c001 /* 0001171b - 0001171c [    2] */,
     0x45cb0003 /* 0001172c - 0001172f [    4] */,
     0x45d1c0b8 /* 00011747 - 000117ff [  185] */,
@@ -609,7 +625,9 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x46920007 /* 00011a48 - 00011a4f [    8] */,
     0x46a8c00c /* 00011aa3 - 00011aaf [   13] */,
     0x46be4006 /* 00011af9 - 00011aff [    7] */,
-    0x46c280f5 /* 00011b0a - 00011bff [  246] */,
+    0x46c280b5 /* 00011b0a - 00011bbf [  182] */,
+    0x46f8800d /* 00011be2 - 00011bef [   14] */,
+    0x46fe8005 /* 00011bfa - 00011bff [    6] */,
     0x47024000 /* 00011c09 - 00011c09 [    1] */,
     0x470dc000 /* 00011c37 - 00011c37 [    1] */,
     0x47118009 /* 00011c46 - 00011c4f [   10] */,
@@ -633,7 +651,7 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x47be4006 /* 00011ef9 - 00011eff [    7] */,
     0x47c44000 /* 00011f11 - 00011f11 [    1] */,
     0x47cec002 /* 00011f3b - 00011f3d [    3] */,
-    0x47d68055 /* 00011f5a - 00011faf [   86] */,
+    0x47d6c054 /* 00011f5b - 00011faf [   85] */,
     0x47ec400e /* 00011fb1 - 00011fbf [   15] */,
     0x47fc800c /* 00011ff2 - 00011ffe [   13] */,
     0x48e68065 /* 0001239a - 000123ff [  102] */,
@@ -642,8 +660,10 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x49510a4b /* 00012544 - 00012f8f [ 2636] */,
     0x4bfcc00c /* 00012ff3 - 00012fff [   13] */,
     0x4d0c000f /* 00013430 - 0001343f [   16] */,
-    0x4d158fa9 /* 00013456 - 000143ff [ 4010] */,
-    0x5191e1b8 /* 00014647 - 000167ff [ 8633] */,
+    0x4d158009 /* 00013456 - 0001345f [   10] */,
+    0x50fec004 /* 000143fb - 000143ff [    5] */,
+    0x5191dab8 /* 00014647 - 000160ff [ 6841] */,
+    0x584e86c5 /* 0001613a - 000167ff [ 1734] */,
     0x5a8e4006 /* 00016a39 - 00016a3f [    7] */,
     0x5a97c000 /* 00016a5f - 00016a5f [    1] */,
     0x5a9a8003 /* 00016a6a - 00016a6d [    4] */,
@@ -655,7 +675,8 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x5ad68000 /* 00016b5a - 00016b5a [    1] */,
     0x5ad88000 /* 00016b62 - 00016b62 [    1] */,
     0x5ade0004 /* 00016b78 - 00016b7c [    5] */,
-    0x5ae402af /* 00016b90 - 00016e3f [  688] */,
+    0x5ae401af /* 00016b90 - 00016d3f [  432] */,
+    0x5b5e80c5 /* 00016d7a - 00016e3f [  198] */,
     0x5ba6c064 /* 00016e9b - 00016eff [  101] */,
     0x5bd2c003 /* 00016f4b - 00016f4e [    4] */,
     0x5be20006 /* 00016f88 - 00016f8e [    7] */,
@@ -663,7 +684,7 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x5bf9400a /* 00016fe5 - 00016fef [   11] */,
     0x5bfc800d /* 00016ff2 - 00016fff [   14] */,
     0x61fe0007 /* 000187f8 - 000187ff [    8] */,
-    0x63358029 /* 00018cd6 - 00018cff [   42] */,
+    0x63358028 /* 00018cd6 - 00018cfe [   41] */,
     0x634262e6 /* 00018d09 - 0001afef [ 8935] */,
     0x6bfd0000 /* 0001aff4 - 0001aff4 [    1] */,
     0x6bff0000 /* 0001affc - 0001affc [    1] */,
@@ -678,7 +699,9 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x6f1f4002 /* 0001bc7d - 0001bc7f [    3] */,
     0x6f224006 /* 0001bc89 - 0001bc8f [    7] */,
     0x6f268001 /* 0001bc9a - 0001bc9b [    2] */,
-    0x6f28125f /* 0001bca0 - 0001ceff [ 4704] */,
+    0x6f280f5f /* 0001bca0 - 0001cbff [ 3936] */,
+    0x733e8005 /* 0001ccfa - 0001ccff [    6] */,
+    0x73ad004b /* 0001ceb4 - 0001ceff [   76] */,
     0x73cb8001 /* 0001cf2e - 0001cf2f [    2] */,
     0x73d1c008 /* 0001cf47 - 0001cf4f [    9] */,
     0x73f1003b /* 0001cfc4 - 0001cfff [   60] */,
@@ -730,7 +753,9 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x78abc010 /* 0001e2af - 0001e2bf [   17] */,
     0x78be8004 /* 0001e2fa - 0001e2fe [    5] */,
     0x78c001cf /* 0001e300 - 0001e4cf [  464] */,
-    0x793e82e5 /* 0001e4fa - 0001e7df [  742] */,
+    0x793e80d5 /* 0001e4fa - 0001e5cf [  214] */,
+    0x797ec003 /* 0001e5fb - 0001e5fe [    4] */,
+    0x798001df /* 0001e600 - 0001e7df [  480] */,
     0x79f9c000 /* 0001e7e7 - 0001e7e7 [    1] */,
     0x79fb0000 /* 0001e7ec - 0001e7ec [    1] */,
     0x79fbc000 /* 0001e7ef - 0001e7ef [    1] */,
@@ -800,18 +825,17 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x7e168005 /* 0001f85a - 0001f85f [    6] */,
     0x7e220007 /* 0001f888 - 0001f88f [    8] */,
     0x7e2b8001 /* 0001f8ae - 0001f8af [    2] */,
-    0x7e2c804d /* 0001f8b2 - 0001f8ff [   78] */,
+    0x7e2f0003 /* 0001f8bc - 0001f8bf [    4] */,
+    0x7e30803d /* 0001f8c2 - 0001f8ff [   62] */,
     0x7e95000b /* 0001fa54 - 0001fa5f [   12] */,
     0x7e9b8001 /* 0001fa6e - 0001fa6f [    2] */,
     0x7e9f4002 /* 0001fa7d - 0001fa7f [    3] */,
-    0x7ea24006 /* 0001fa89 - 0001fa8f [    7] */,
-    0x7eaf8000 /* 0001fabe - 0001fabe [    1] */,
-    0x7eb18007 /* 0001fac6 - 0001facd [    8] */,
-    0x7eb70003 /* 0001fadc - 0001fadf [    4] */,
-    0x7eba4006 /* 0001fae9 - 0001faef [    7] */,
+    0x7ea28004 /* 0001fa8a - 0001fa8e [    5] */,
+    0x7eb1c006 /* 0001fac7 - 0001facd [    7] */,
+    0x7eb74001 /* 0001fadd - 0001fade [    2] */,
+    0x7eba8005 /* 0001faea - 0001faef [    6] */,
     0x7ebe4006 /* 0001faf9 - 0001faff [    7] */,
     0x7ee4c000 /* 0001fb93 - 0001fb93 [    1] */,
-    0x7ef2c024 /* 0001fbcb - 0001fbef [   37] */,
     0x7efe8405 /* 0001fbfa - 0001ffff [ 1030] */,
     0xa9b8001f /* 0002a6e0 - 0002a6ff [   32] */,
     0xadce8005 /* 0002b73a - 0002b73f [    6] */,
diff --git a/libcxx/include/__format/extended_grapheme_cluster_table.h b/libcxx/include/__format/extended_grapheme_cluster_table.h
index 7653a9e03b815d..f76e018df7ae11 100644
--- a/libcxx/include/__format/extended_grapheme_cluster_table.h
+++ b/libcxx/include/__format/extended_grapheme_cluster_table.h
@@ -125,7 +125,7 @@ enum class __property : uint8_t {
 /// following benchmark.
 /// libcxx/benchmarks/std_format_spec_string_unicode.bench.cpp
 // clang-format off
-_LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
+_LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1501] = {
     0x00000091,
     0x00005005,
     0x00005811,
@@ -164,7 +164,7 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x00414842,
     0x0042c822,
     0x00448018,
-    0x0044c072,
+    0x0044b882,
     0x00465172,
     0x00471008,
     0x004719f2,
@@ -246,14 +246,12 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x0064101a,
     0x0065e002,
     0x0065f00a,
-    0x0065f802,
-    0x0066001a,
+    0x0065f812,
+    0x0066080a,
     0x00661002,
     0x0066181a,
-    0x00663002,
-    0x0066381a,
-    0x0066501a,
-    0x00666012,
+    0x00663022,
+    0x00665032,
     0x0066a812,
     0x00671012,
     0x0067980a,
@@ -318,10 +316,8 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x008b047c,
     0x008d457b,
     0x009ae822,
-    0x00b89022,
-    0x00b8a80a,
-    0x00b99012,
-    0x00b9a00a,
+    0x00b89032,
+    0x00b99022,
     0x00ba9012,
     0x00bb9012,
     0x00bda012,
@@ -361,29 +357,23 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x00d581e2,
     0x00d80032,
     0x00d8200a,
-    0x00d9a062,
-    0x00d9d80a,
-    0x00d9e002,
-    0x00d9e84a,
-    0x00da1002,
-    0x00da181a,
+    0x00d9a092,
+    0x00d9f03a,
+    0x00da1022,
     0x00db5882,
     0x00dc0012,
     0x00dc100a,
     0x00dd080a,
     0x00dd1032,
     0x00dd301a,
-    0x00dd4012,
-    0x00dd500a,
-    0x00dd5822,
+    0x00dd4052,
     0x00df3002,
     0x00df380a,
     0x00df4012,
     0x00df502a,
     0x00df6802,
     0x00df700a,
-    0x00df7822,
-    0x00df901a,
+    0x00df7842,
     0x00e1207a,
     0x00e16072,
     0x00e1a01a,
@@ -475,7 +465,8 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x0547f802,
     0x05493072,
     0x054a38a2,
-    0x054a901a,
+    0x054a900a,
+    0x054a9802,
     0x054b01c4,
     0x054c0022,
     0x054c180a,
@@ -484,7 +475,8 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x054db032,
     0x054dd01a,
     0x054de012,
-    0x054df02a,
+    0x054df01a,
+    0x054e0002,
     0x054f2802,
     0x05514852,
     0x0551781a,
@@ -1328,8 +1320,9 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x0851f802,
     0x08572812,
     0x08692032,
+    0x086b4842,
     0x08755812,
-    0x0877e822,
+    0x0877e032,
     0x087a30a2,
     0x087c1032,
     0x0880000a,
@@ -1357,7 +1350,8 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x088c100a,
     0x088d982a,
     0x088db082,
-    0x088df81a,
+    0x088df80a,
+    0x088e0002,
     0x088e1018,
     0x088e4832,
     0x088e700a,
@@ -1365,9 +1359,7 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x0891602a,
     0x08917822,
     0x0891901a,
-    0x0891a002,
-    0x0891a80a,
-    0x0891b012,
+    0x0891a032,
     0x0891f002,
     0x08920802,
     0x0896f802,
@@ -1381,11 +1373,24 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x089a0002,
     0x089a083a,
     0x089a381a,
-    0x089a582a,
+    0x089a581a,
+    0x089a6802,
     0x089ab802,
     0x089b101a,
     0x089b3062,
     0x089b8042,
+    0x089dc002,
+    0x089dc81a,
+    0x089dd852,
+    0x089e1002,
+    0x089e2802,
+    0x089e3822,
+    0x089e500a,
+    0x089e601a,
+    0x089e7022,
+    0x089e8808,
+    0x089e9002,
+    0x089f0812,
     0x08a1a82a,
     0x08a1c072,
     0x08a2001a,
@@ -1422,10 +1427,10 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x08b5600a,
     0x08b56802,
     0x08b5701a,
-    0x08b58052,
-    0x08b5b00a,
-    0x08b5b802,
-    0x08b8e822,
+    0x08b58072,
+    0x08b8e802,
+    0x08b8f00a,
+    0x08b8f802,
     0x08b91032,
     0x08b9300a,
     0x08b93842,
@@ -1436,9 +1441,7 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x08c98002,
     0x08c9884a,
     0x08c9b81a,
-    0x08c9d812,
-    0x08c9e80a,
-    0x08c9f002,
+    0x08c9d832,
     0x08c9f808,
     0x08ca000a,
     0x08ca0808,
@@ -1495,28 +1498,29 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x08f9a01a,
     0x08f9b042,
     0x08f9f01a,
-    0x08fa0002,
-    0x08fa080a,
-    0x08fa1002,
+    0x08fa0022,
+    0x08fad002,
     0x09a180f1,
     0x09a20002,
     0x09a238e2,
+    0x0b08f0b2,
+    0x0b09502a,
+    0x0b096822,
     0x0b578042,
     0x0b598062,
+    0x0b6b180c,
+    0x0b6b383c,
     0x0b7a7802,
     0x0b7a8b6a,
     0x0b7c7832,
     0x0b7f2002,
-    0x0b7f801a,
+    0x0b7f8012,
     0x0de4e812,
     0x0de50031,
     0x0e7802d2,
     0x0e798162,
-    0x0e8b2802,
-    0x0e8b300a,
-    0x0e8b3822,
-    0x0e8b680a,
-    0x0e8b7042,
+    0x0e8b2842,
+    0x0e8b6852,
     0x0e8b9871,
     0x0e8bd872,
     0x0e8c2862,
@@ -1538,6 +1542,7 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1496] = {
     0x0f157002,
     0x0f176032,
     0x0f276032,
+    0x0f2f7012,
     0x0f468062,
     0x0f4a2062,
     0x0f8007f3,
diff --git a/libcxx/include/__format/indic_conjunct_break_table.h b/libcxx/include/__format/indic_conjunct_break_table.h
index df6cfe6a02f348..f48ea625908e99 100644
--- a/libcxx/include/__format/indic_conjunct_break_table.h
+++ b/libcxx/include/__format/indic_conjunct_break_table.h
@@ -107,10 +107,9 @@ enum class __property : uint8_t {
 /// following benchmark.
 /// libcxx/benchmarks/std_format_spec_string_unicode.bench.cpp
 // clang-format off
-_LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[201] = {
-    0x00180139,
-    0x001a807d,
-    0x00241811,
+_LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[403] = {
+    0x001801bd,
+    0x00241819,
     0x002c88b1,
     0x002df801,
     0x002e0805,
@@ -125,6 +124,7 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[201] = {
     0x0037500d,
     0x00388801,
     0x00398069,
+    0x003d3029,
     0x003f5821,
     0x003fe801,
     0x0040b00d,
@@ -132,87 +132,174 @@ _LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[201] = {
     0x00412809,
     0x00414811,
     0x0042c809,
-    0x0044c01d,
+    0x0044b821,
     0x0046505d,
-    0x00471871,
+    0x0047187d,
     0x0048a890,
+    0x0049d001,
     0x0049e001,
+    0x004a081d,
     0x004a6802,
-    0x004a880d,
+    0x004a8819,
     0x004ac01c,
+    0x004b1005,
     0x004bc01c,
+    0x004c0801,
     0x004ca84c,
     0x004d5018,
     0x004d9000,
     0x004db00c,
     0x004de001,
+    0x004df001,
+    0x004e080d,
     0x004e6802,
+    0x004eb801,
     0x004ee004,
     0x004ef800,
+    0x004f1005,
     0x004f8004,
     0x004ff001,
+    0x00500805,
     0x0051e001,
+    0x00520805,
+    0x00523805,
+    0x00525809,
+    0x00528801,
+    0x00538005,
+    0x0053a801,
+    0x00540805,
     0x0054a84c,
     0x00555018,
     0x00559004,
     0x0055a810,
     0x0055e001,
+    0x00560811,
+    0x00563805,
     0x00566802,
+    0x00571005,
     0x0057c800,
+    0x0057d015,
+    0x00580801,
     0x0058a84c,
     0x00595018,
     0x00599004,
     0x0059a810,
     0x0059e001,
+    0x0059f005,
+    0x005a080d,
     0x005a6802,
+    0x005aa809,
     0x005ae004,
     0x005af800,
+    0x005b1005,
     0x005b8800,
+    0x005c1001,
+    0x005df001,
+    0x005e0001,
+    0x005e6801,
+    0x005eb801,
+    0x00600001,
+    0x00...
[truncated]

Copy link

github-actions bot commented Feb 4, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff fe694b18dc518b86eae9aab85ff03abc54e1662f 04fc5782a5bf720d04c12bd340ff0efa5b0a5845 --extensions cpp,h -- libcxx/include/__format/escaped_output_table.h libcxx/include/__format/extended_grapheme_cluster_table.h libcxx/include/__format/indic_conjunct_break_table.h libcxx/include/__format/width_estimation_table.h libcxx/test/libcxx/utilities/format/format.string/format.string.std/escaped_output.pass.cpp libcxx/test/libcxx/utilities/format/format.string/format.string.std/extended_grapheme_cluster.h libcxx/test/libcxx/utilities/format/format.string/format.string.std/extended_grapheme_cluster.pass.cpp
View the diff from clang-format here.
diff --git a/libcxx/test/libcxx/utilities/format/format.string/format.string.std/extended_grapheme_cluster.h b/libcxx/test/libcxx/utilities/format/format.string/format.string.std/extended_grapheme_cluster.h
index 9664622ab4..20386ee553 100644
--- a/libcxx/test/libcxx/utilities/format/format.string/format.string.std/extended_grapheme_cluster.h
+++ b/libcxx/test/libcxx/utilities/format/format.string/format.string.std/extended_grapheme_cluster.h
@@ -82,8 +82,8 @@ struct data {
 };
 
 /// The data for UTF-8.
-std::array<data<char>, 1093> data_utf8 = {{
-     {"\U00000020\U00000020", {32, 32}, {1, 2}},
+std::array<data<char>, 1093> data_utf8 = {
+    {{"\U00000020\U00000020", {32, 32}, {1, 2}},
      {"\U00000020\U00000308\U00000020", {32, 32}, {3, 4}},
      {"\U00000020\U0000000d", {32, 13}, {1, 2}},
      {"\U00000020\U00000308\U0000000d", {32, 13}, {3, 4}},
@@ -1183,8 +1183,8 @@ std::array<data<char>, 1093> data_utf8 = {{
 /// since the size of the code units differ the breaks can contain different
 /// values.
 #ifndef TEST_HAS_NO_WIDE_CHARACTERS
-std::array<data<wchar_t>, 1093> data_utf16 = {{
-     {L"\U00000020\U00000020", {32, 32}, {1, 2}},
+std::array<data<wchar_t>, 1093> data_utf16 = {
+    {{L"\U00000020\U00000020", {32, 32}, {1, 2}},
      {L"\U00000020\U00000308\U00000020", {32, 32}, {2, 3}},
      {L"\U00000020\U0000000d", {32, 13}, {1, 2}},
      {L"\U00000020\U00000308\U0000000d", {32, 13}, {2, 3}},
@@ -2283,8 +2283,8 @@ std::array<data<wchar_t>, 1093> data_utf16 = {{
 /// Note that most of the data for the UTF-16 and UTF-32 are identical. However
 /// since the size of the code units differ the breaks can contain different
 /// values.
-std::array<data<wchar_t>, 1093> data_utf32 = {{
-     {L"\U00000020\U00000020", {32, 32}, {1, 2}},
+std::array<data<wchar_t>, 1093> data_utf32 = {
+    {{L"\U00000020\U00000020", {32, 32}, {1, 2}},
      {L"\U00000020\U00000308\U00000020", {32, 32}, {2, 3}},
      {L"\U00000020\U0000000d", {32, 13}, {1, 2}},
      {L"\U00000020\U00000308\U0000000d", {32, 13}, {2, 3}},

Updates the databease to the Unicode release 16.0.0. The algorithms of
the Grapheme clustering rules have not changed.
@mordante mordante force-pushed the review/update_unicode branch from 9e2d61b to 04fc578 Compare February 4, 2025 17:54
@mordante
Copy link
Member Author

mordante commented Feb 5, 2025

Formatting error is in a generated file.

@mordante mordante merged commit 5b98be4 into llvm:main Feb 5, 2025
77 of 78 checks passed
@mordante mordante deleted the review/update_unicode branch February 5, 2025 17:55
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Updates the databease to the Unicode release 16.0.0. The algorithms of
the Grapheme clustering rules have not changed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants