Skip to content

Conversation

@cakebaker
Copy link
Contributor

@cakebaker cakebaker commented Dec 8, 2025

Currently, we output the number separator as a lossy string:

$ printf "a\nb" | cargo run -q nl --number-separator=$'\xFF' | hexdump -C
00000000  20 20 20 20 20 31 ef bf  bd 61 0a 20 20 20 20 20  |     1...a.     |
00000010  32 ef bf bd 62 0a                                 |2...b.|
00000016

Whereas GNU nl doesn't:

$ printf "a\nb" | nl --number-separator=$'\xFF' | hexdump -C
00000000  20 20 20 20 20 31 ff 61  0a 20 20 20 20 20 32 ff  |     1.a.     2.|
00000010  62 0a                                             |b.|
00000012

This PR fixes the issue.

Update: The PR also fixes the same issue for the content. Our current output:

$ printf $'\xFF' | cargo run -q nl | hexdump -C
00000000  20 20 20 20 20 31 09 ef  bf bd 0a                 |     1.....|
0000000b 

The output of GNU nl:

$ printf $'\xFF' | nl | hexdump -C
00000000  20 20 20 20 20 31 09 ff  0a                       |     1...|
00000009

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 8, 2025

CodSpeed Performance Report

Merging #9601 will improve performances by 66.23%

Comparing cakebaker:nl_fix_non_utf8_number_separator (db52eac) with main (aaa0610)

Summary

⚡ 2 improvements
✅ 125 untouched
⏩ 6 skipped1

Benchmarks breakdown

Benchmark BASE HEAD Change
nl_large_file[10] 100.9 ms 61.2 ms +65.03%
nl_many_lines[100000] 79.8 ms 48 ms +66.23%

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@cakebaker cakebaker marked this pull request as draft December 8, 2025 13:59
@cakebaker cakebaker force-pushed the nl_fix_non_utf8_number_separator branch from d8bdce0 to aea0985 Compare December 8, 2025 14:39
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/cksum/b2sum is no longer failing!

@cakebaker cakebaker marked this pull request as ready for review December 8, 2025 14:58
@cakebaker cakebaker force-pushed the nl_fix_non_utf8_number_separator branch from aea0985 to db52eac Compare December 8, 2025 15:21
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/cksum/b2sum is no longer failing!

@ChrisDryden
Copy link
Contributor

I was hoping we could create a macro that is similar to the new_ucmd that allows us to validate using the args and and then make sure that the stderr, stdout and error code match with the GNU implementation, I believe that would have highlighted this issue earlier right?

writeln!(
writer,
"{}{}{}",
let mut buf = Vec::with_capacity(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing performance improvement, we should look for more changes like this in the codebase. For anyone looking in the future for more perf improvements its now at the point that a big percentage of the time is spent freeing the vec, and if you do .clear it actually just sets the len to 0 so that you can just use the same buffer in the loop and add more capacity if needed by checking buf.capacity().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants