Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: vis-encode all path characters except visible ASCII #985
base: main
Are you sure you want to change the base?
feat: vis-encode all path characters except visible ASCII #985
Changes from 12 commits
032b7e7
4d77429
b9996f8
ee35bb3
ae88acb
7bde12c
4c2d148
419cd8a
0a4bba6
41b8d94
e970283
522d76b
c0c49f0
7fe54b8
54694ce
1aded37
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit scared of having this volume of new code for the existing maintainers to understand and bugfix.
https://github.com/vbatts/go-mtree/tree/main/pkg/govis also has a Go implementation of the
vis(1)
BSD built-in. And @vbatts has indicated willingness to help out with our Bazel tar implementation. Do you think there's any code-sharing possible?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I recognize that this is a fairly big change set I am proposing. I'm happy to search for ways to reduce the burden. Maybe you could provide some insight into the areas you find most concerning and where efforts would do the most good? As the author, I'm a bit blinded by the curse of knowledge, so I'm not entirely sure where my energies would best be spent.
Is this more about the 171 SLOC in the
gen_vis_scripts
tool? Or about the 1069 SLOC of the entire PR? Assuming the latter, I have some ideas on how to get that down, although I'm not so sure they are improvements.For instance, the processing of 7-bit ASCII within Bazel/Starlark requires separate support code — the 103 SLOC addition to
strings.bzl
— that could be dropped by moving all escaping work out to thesed
pass. Although I think astr.maketrans
/str.translate
facility in Starlark would be independently nice to have. Maybe that should be separated though?Another thing to get the PR line count down would be the checked-in generated
.sed
scripts, totalling 588 SLOC (or ~630 SLOC if they take on the work of ASCII escaping too). Those could be codegenned in dev and published as artifacts in release, just like binary artifacts. That seemed like more work to set up and makes the scripts a bit less directly accessible, but its something we could do if the raw SLOC number is the concern.Would additional testing make you more comfortable with this change? I considered adding some Bats-based testing of the individual
sed
scripts but with PR size already a concern I wasn't sure whether that might be considered over-engineering.If there is a way to achieve meaningful code size reduction through code sharing with
govis
, I am not seeing it. It is an interesting project, but it seems to me to be just different enough that it doesn't quite suit what is called for here. FWICT, there are two root-cause differences that drive this: a difference in implementation approach, and a difference in reverse-engineering target.govis
does its conversion as a single pass — it visits each character only once — which is a nice approach that I would recommend whenever it is available. Unfortunately, to my knowledge, no such facility is available insed
(or almost1 any other POSIX utility); we only have repeated substitution available to us. IMO, that limitation is what drives the more complex behaviours in the generatedsed
scripts — the careful deconfliction of backslash and NUL handling — which has no analogue ingovis
.govis
seems to have been reverse-engineered fromvis
. The escaping behaviour implemented here has been reverse-engineered from libarchive, which has its own interpreter for escape sequences. These differ from one another ever so slightly, and also a bit from the mtree file format spec. Since the intent is to usemtree
files withbsdtar
, which uses libarchive, I felt that fidelity to the stable quirks of that implementation would be the least surprising and disruptive for callers.govis
' choice to replicatevis
also up-scopes what it does from what we actually need.mtree
's encoding for paths isn't the fully generalvis
encoding — only the octal escape sequences are part of the documented file format and libarchive tacks on a couple of additional, C-inspired escape sequences. Mostgovis
modes, faithfully reproduced fromvis
, are not interesting to us in this context.Footnotes
You might be able to do it in
awk
but (1) you'd be writing pretty much an entire parser rather than just a translation table, opportunities for code sharing would be even less, and (2)awk
has some quirks in its handling of unusual characters that make it not a great tool when we just want to process bytes. ↩