Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++][format] Discard contents since null-terminator in character arrays in formatting #116571

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

frederick-vs-ja
Copy link
Contributor

Currently, built-in char/wchar_t arrays are assumed to be null-terminated sequence with the terminator being the last element in formatting. This doesn't conform to [format.arg]/6.9.

otherwise, if decay_t<TD> is char_type* or const char_type*, initializes value with static_cast<const char_type*>(v);

The standard wording specifies that character arrays are decayed to pointers. When the null terminator is not the last element or there's no null terminator (the latter case is UB), libc++ currently produces different results.

Fixes #115935.

@frederick-vs-ja frederick-vs-ja requested a review from a team as a code owner November 18, 2024 02:19
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Nov 18, 2024
@frederick-vs-ja frederick-vs-ja added the format C++20 std::format or std::print, and anything related to them label Nov 18, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 18, 2024

@llvm/pr-subscribers-libcxx

Author: A. Jiang (frederick-vs-ja)

Changes

Currently, built-in char/wchar_t arrays are assumed to be null-terminated sequence with the terminator being the last element in formatting. This doesn't conform to [format.arg]/6.9.

> otherwise, if decay_t&lt;TD&gt; is char_type* or const char_type*, initializes value with static_cast&lt;const char_type*&gt;(v);

The standard wording specifies that character arrays are decayed to pointers. When the null terminator is not the last element or there's no null terminator (the latter case is UB), libc++ currently produces different results.

Fixes #115935.


Full diff: https://github.com/llvm/llvm-project/pull/116571.diff

2 Files Affected:

  • (modified) libcxx/include/__format/format_arg_store.h (+5-18)
  • (modified) libcxx/test/std/utilities/format/format.functions/format_tests.h (+4)
diff --git a/libcxx/include/__format/format_arg_store.h b/libcxx/include/__format/format_arg_store.h
index 8b2c95c657c9bd..78b142ebcdb743 100644
--- a/libcxx/include/__format/format_arg_store.h
+++ b/libcxx/include/__format/format_arg_store.h
@@ -101,20 +101,14 @@ consteval __arg_t __determine_arg_t() {
   return __arg_t::__long_double;
 }
 
-// Char pointer
+// Char pointer or array
 template <class _Context, class _Tp>
-  requires(same_as<typename _Context::char_type*, _Tp> || same_as<const typename _Context::char_type*, _Tp>)
+  requires(same_as<typename _Context::char_type*, _Tp> || same_as<const typename _Context::char_type*, _Tp>) ||
+          (is_array_v<_Tp> && same_as<_Tp, typename _Context::char_type[extent_v<_Tp>]>)
 consteval __arg_t __determine_arg_t() {
   return __arg_t::__const_char_type_ptr;
 }
 
-// Char array
-template <class _Context, class _Tp>
-  requires(is_array_v<_Tp> && same_as<_Tp, typename _Context::char_type[extent_v<_Tp>]>)
-consteval __arg_t __determine_arg_t() {
-  return __arg_t::__string_view;
-}
-
 // String view
 template <class _Context, class _Tp>
   requires(same_as<typename _Context::char_type, typename _Tp::value_type> &&
@@ -188,15 +182,8 @@ _LIBCPP_HIDE_FROM_ABI basic_format_arg<_Context> __create_format_arg(_Tp& __valu
   else if constexpr (__arg == __arg_t::__unsigned_long_long)
     return basic_format_arg<_Context>{__arg, static_cast<unsigned long long>(__value)};
   else if constexpr (__arg == __arg_t::__string_view)
-    // Using std::size on a character array will add the NUL-terminator to the size.
-    if constexpr (is_array_v<_Dp>)
-      return basic_format_arg<_Context>{
-          __arg, basic_string_view<typename _Context::char_type>{__value, extent_v<_Dp> - 1}};
-    else
-      // When the _Traits or _Allocator are different an implicit conversion will
-      // fail.
-      return basic_format_arg<_Context>{
-          __arg, basic_string_view<typename _Context::char_type>{__value.data(), __value.size()}};
+    return basic_format_arg<_Context>{
+        __arg, basic_string_view<typename _Context::char_type>{__value.data(), __value.size()}};
   else if constexpr (__arg == __arg_t::__ptr)
     return basic_format_arg<_Context>{__arg, static_cast<const void*>(__value)};
   else if constexpr (__arg == __arg_t::__handle)
diff --git a/libcxx/test/std/utilities/format/format.functions/format_tests.h b/libcxx/test/std/utilities/format/format.functions/format_tests.h
index b2ed6775fe8a13..33212fa663e4a5 100644
--- a/libcxx/test/std/utilities/format/format.functions/format_tests.h
+++ b/libcxx/test/std/utilities/format/format.functions/format_tests.h
@@ -3189,6 +3189,10 @@ void format_tests(TestFunction check, ExceptionTest check_exception) {
     const CharT* data = buffer;
     check(SV("hello 09azAZ!"), SV("hello {}"), data);
   }
+  {
+    CharT buffer[] = {CharT('a'), CharT('b'), CharT('c'), 0, CharT('d'), CharT('e'), CharT('f'), 0};
+    check(SV("hello abc"), SV("hello {}"), buffer);
+  }
   {
     std::basic_string<CharT> data = STR("world");
     check(SV("hello world"), SV("hello {}"), data);

Currently, built-in `char`/`wchar_t` arrays are assumed to be
null-terminated sequence with the terminator being the last element in
formatting. This doesn't conform to [format.arg]/6.9.

> otherwise, if `decay_t<TD>` is `char_type*` or `const char_type*`,
> initializes value with `static_cast<const char_type*>(v)`;

The standard wording specifies that character arrays are decayed to
pointers. When the null terminator is not the last element or there's no
null terminator (the latter case is UB), libc++ currently produces
different results.
@frederick-vs-ja frederick-vs-ja changed the title [libc++][format] Decay character arrays in formatting [libc++][format] Discard contents since null-terminator in character arrays in formatting Nov 27, 2024
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Still a few more comments but I think this should be good to go once they've been addressed.

Comment on lines +206 to +208
} else {
// Per [format.arg]/5, the behavior is undefined because the array is not null-terminated.
return basic_format_arg<_Context>{__arg, basic_string_view<__context_char_type>{__pbegin, extent_v<_Dp>}};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is UB, instead shouldn't we do this?

if constexpr (__is_bounded_array_of<_Dp, __context_char_type>) {
  const __context_char_type* const __pzero = char_traits<__context_char_type>::find(__pbegin, extent_v<_Dp>, __context_char_type{});
  _LIBCPP_ASSERT_VALID_RANGE(__pzero != nullptr, "we'd read out of bounds");
  return basic_format_arg<_Context>{__arg, basic_string_view<__context_char_type>{__pbegin, static_cast<size_t>(__pzero - __pbegin)}};

and then drop the else that currently says // Per [format.arg]/5, the behavior is undefined because the array is not null-terminated.. WDYT?

Copy link
Member

@mordante mordante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. A few comments.

Comment on lines +36 to +41
template <class _Arr, class _Elem>
inline constexpr bool __is_bounded_array_of = false;

template <class _Elem, size_t _Len>
inline constexpr bool __is_bounded_array_of<_Elem[_Len], _Elem> = true;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these really useful? They are only used twice.

{
// https://github.com/llvm/llvm-project/issues/115935
// Contents after the embedded null character are discarded.
CharT buffer[] = {CharT('a'), CharT('b'), CharT('c'), 0, CharT('d'), CharT('e'), CharT('f'), 0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a second test case using
CharT buffer[] = {CharT('a'), CharT('b'), CharT('c'), 0, CharT('d'), CharT('e'), CharT('f')};
(so without the final NUL character.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format C++20 std::format or std::print, and anything related to them libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[libc++] std::format reads past the 0 terminator in char[N] arrays
4 participants