-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++][format] Discard contents since null-terminator in character arrays in formatting #116571
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-libcxx Author: A. Jiang (frederick-vs-ja) ChangesCurrently, built-in > otherwise, if The standard wording specifies that character arrays are decayed to pointers. When the null terminator is not the last element or there's no null terminator (the latter case is UB), libc++ currently produces different results. Fixes #115935. Full diff: https://github.com/llvm/llvm-project/pull/116571.diff 2 Files Affected:
diff --git a/libcxx/include/__format/format_arg_store.h b/libcxx/include/__format/format_arg_store.h
index 8b2c95c657c9bd..78b142ebcdb743 100644
--- a/libcxx/include/__format/format_arg_store.h
+++ b/libcxx/include/__format/format_arg_store.h
@@ -101,20 +101,14 @@ consteval __arg_t __determine_arg_t() {
return __arg_t::__long_double;
}
-// Char pointer
+// Char pointer or array
template <class _Context, class _Tp>
- requires(same_as<typename _Context::char_type*, _Tp> || same_as<const typename _Context::char_type*, _Tp>)
+ requires(same_as<typename _Context::char_type*, _Tp> || same_as<const typename _Context::char_type*, _Tp>) ||
+ (is_array_v<_Tp> && same_as<_Tp, typename _Context::char_type[extent_v<_Tp>]>)
consteval __arg_t __determine_arg_t() {
return __arg_t::__const_char_type_ptr;
}
-// Char array
-template <class _Context, class _Tp>
- requires(is_array_v<_Tp> && same_as<_Tp, typename _Context::char_type[extent_v<_Tp>]>)
-consteval __arg_t __determine_arg_t() {
- return __arg_t::__string_view;
-}
-
// String view
template <class _Context, class _Tp>
requires(same_as<typename _Context::char_type, typename _Tp::value_type> &&
@@ -188,15 +182,8 @@ _LIBCPP_HIDE_FROM_ABI basic_format_arg<_Context> __create_format_arg(_Tp& __valu
else if constexpr (__arg == __arg_t::__unsigned_long_long)
return basic_format_arg<_Context>{__arg, static_cast<unsigned long long>(__value)};
else if constexpr (__arg == __arg_t::__string_view)
- // Using std::size on a character array will add the NUL-terminator to the size.
- if constexpr (is_array_v<_Dp>)
- return basic_format_arg<_Context>{
- __arg, basic_string_view<typename _Context::char_type>{__value, extent_v<_Dp> - 1}};
- else
- // When the _Traits or _Allocator are different an implicit conversion will
- // fail.
- return basic_format_arg<_Context>{
- __arg, basic_string_view<typename _Context::char_type>{__value.data(), __value.size()}};
+ return basic_format_arg<_Context>{
+ __arg, basic_string_view<typename _Context::char_type>{__value.data(), __value.size()}};
else if constexpr (__arg == __arg_t::__ptr)
return basic_format_arg<_Context>{__arg, static_cast<const void*>(__value)};
else if constexpr (__arg == __arg_t::__handle)
diff --git a/libcxx/test/std/utilities/format/format.functions/format_tests.h b/libcxx/test/std/utilities/format/format.functions/format_tests.h
index b2ed6775fe8a13..33212fa663e4a5 100644
--- a/libcxx/test/std/utilities/format/format.functions/format_tests.h
+++ b/libcxx/test/std/utilities/format/format.functions/format_tests.h
@@ -3189,6 +3189,10 @@ void format_tests(TestFunction check, ExceptionTest check_exception) {
const CharT* data = buffer;
check(SV("hello 09azAZ!"), SV("hello {}"), data);
}
+ {
+ CharT buffer[] = {CharT('a'), CharT('b'), CharT('c'), 0, CharT('d'), CharT('e'), CharT('f'), 0};
+ check(SV("hello abc"), SV("hello {}"), buffer);
+ }
{
std::basic_string<CharT> data = STR("world");
check(SV("hello world"), SV("hello {}"), data);
|
6b05bb4
to
4424283
Compare
Currently, built-in `char`/`wchar_t` arrays are assumed to be null-terminated sequence with the terminator being the last element in formatting. This doesn't conform to [format.arg]/6.9. > otherwise, if `decay_t<TD>` is `char_type*` or `const char_type*`, > initializes value with `static_cast<const char_type*>(v)`; The standard wording specifies that character arrays are decayed to pointers. When the null terminator is not the last element or there's no null terminator (the latter case is UB), libc++ currently produces different results.
4424283
to
96f2f4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Still a few more comments but I think this should be good to go once they've been addressed.
} else { | ||
// Per [format.arg]/5, the behavior is undefined because the array is not null-terminated. | ||
return basic_format_arg<_Context>{__arg, basic_string_view<__context_char_type>{__pbegin, extent_v<_Dp>}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is UB, instead shouldn't we do this?
if constexpr (__is_bounded_array_of<_Dp, __context_char_type>) {
const __context_char_type* const __pzero = char_traits<__context_char_type>::find(__pbegin, extent_v<_Dp>, __context_char_type{});
_LIBCPP_ASSERT_VALID_RANGE(__pzero != nullptr, "we'd read out of bounds");
return basic_format_arg<_Context>{__arg, basic_string_view<__context_char_type>{__pbegin, static_cast<size_t>(__pzero - __pbegin)}};
and then drop the else
that currently says // Per [format.arg]/5, the behavior is undefined because the array is not null-terminated.
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. A few comments.
template <class _Arr, class _Elem> | ||
inline constexpr bool __is_bounded_array_of = false; | ||
|
||
template <class _Elem, size_t _Len> | ||
inline constexpr bool __is_bounded_array_of<_Elem[_Len], _Elem> = true; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these really useful? They are only used twice.
{ | ||
// https://github.com/llvm/llvm-project/issues/115935 | ||
// Contents after the embedded null character are discarded. | ||
CharT buffer[] = {CharT('a'), CharT('b'), CharT('c'), 0, CharT('d'), CharT('e'), CharT('f'), 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a second test case using
CharT buffer[] = {CharT('a'), CharT('b'), CharT('c'), 0, CharT('d'), CharT('e'), CharT('f')};
(so without the final NUL character.)
Currently, built-in
char
/wchar_t
arrays are assumed to be null-terminated sequence with the terminator being the last element in formatting. This doesn't conform to [format.arg]/6.9.The standard wording specifies that character arrays are decayed to pointers. When the null terminator is not the last element or there's no null terminator (the latter case is UB), libc++ currently produces different results.
Fixes #115935.