-
Notifications
You must be signed in to change notification settings - Fork 29
Conversation
What kind of failures were you running into? On this branch if I run
|
0e70df4
to
03ecfdb
Compare
I think the source of all my 26 errors is It's like I am missing some initial setup.
That's interesting, as this might mean something similar to what I encountered in the This failure means there is more than expected left on the wire. In I have force-pushed the adjustment in case you want to give it another try. |
Signed-off-by: Sebastian Thiel <[email protected]>
… a pack The pack-writer perfectly consumes all input of the pack without overshoot, which it can as it knows how many objects it ought to read. Thus, the last packetline of the pack is consumed to the last byte without ever asking for more data that would then be denied as the flush packet is encountered. This needs to be accounted for in the calling code. Signed-off-by: Sebastian Thiel <[email protected]>
03ecfdb
to
4ace976
Compare
); | ||
// Consume anything that might still be left on the wire - this is 'EOF' most of the time, | ||
// but some tests have 'garbage' here as well. | ||
std::io::copy(&mut reader, &mut std::io::sink())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is quite similar to how I was doing it before, except we're not bothering to check if it's the flush packet or not.
fwiw, I tried running the test suite with cargo test --all
and it worked 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to hear!
Actually it doesn't seem to always be the flush package, otherwise there wouldn't have been that one test failure. Once my tests are working, I will look at the contents that it's copying, maybe it's the same 'garbage' bytes that I have been seeing.
Hmmmm what OS are you using? From a quick look around it seems to be the Radicle socket paths are too long on your system https://unix.stackexchange.com/questions/367008/why-is-socket-path-length-limited-to-a-hundred-chars |
It looks like BSDs, and so MacOS, have a 104 character limit. I am on MacOS 14.1.1, so probably it's that. Is there a chance to make the socket path shorter on MacOS? I investigated real quick and it does turn out that setting This is what I tried:
Which fails with:
Even when running the tests with When looking at a detailed error log, at least it becomes clear why the sockets are too long - I presume it would create them in the temporary directory…
… To make this work, I think on MacOS there would have to be a special case at least for the tests - would you like me to implement that as well? Happy to contribute this as I really do want to run the tests successfully :). |
Hi @Byron, so I was also running into this issue on MacOS and I solved it by defining the |
Thanks for the hint, @sebastinez, thanks to it I am now down to 3 failures which seem (more) legit. But since you are running MacOS as well I wonder why you (probably) don't see them.
The last one occours for 35 tests in @FintanH, regarding the two failures of the previous version of the line that would assert with diff --git a/radicle-fetch/src/transport/fetch.rs b/radicle-fetch/src/transport/fetch.rs
index 5dfc1535..4558df49 100644
--- a/radicle-fetch/src/transport/fetch.rs
+++ b/radicle-fetch/src/transport/fetch.rs
@@ -269,7 +269,7 @@ where
);
// Consume anything that might still be left on the wire - this is 'EOF' most of the time,
// but some tests have 'garbage' here as well.
- std::io::copy(&mut reader, &mut std::io::sink())?;
+ assert!(reader.peek_data_line().is_none());
assert_eq!(
reader.stopped_at(),
Some(MessageKind::Flush), …are not reproducing for me. I only get my expected 37 errors, but none in That's strange, except that maybe this is related to the
When I learn about your |
Hey @Byron so I'm on git version
But I just ran the test suite, long time ago since I ran it the last time, and I get the same errors you are getting. |
For this case, it's an oversight in the test suite that I have a fix for here. Essentially, you have a different Regarding this failure:
I actually witnessed this running on my Mac recently. We have no fix and it's an annoying race condition, so in this case it can be ignored. However, this one is new and I'd need to dig into what's going on:
Indeed, I am on a newer version of git:
|
Ah, good to hear it's something trivial. In the
Ok, I will keep that in mind. Thus far, it was so consistent though that I wasn't able to identify it as such though. There are at least two other tests, one with
I dug in yesterday but forgot to share my findings. I could reproduce it with: ❯ sed -i 's/Hello World/Hello Radicle/' main.c
sed: 1: "main.c": invalid command code m …and I think the MacOS fix for 'edit in place - save no backup' is:
Of course I wouldn't know yet how to make such a change in the sandbox test description except for copying it entirely, and will leave the fix to you. Regarding you seeing two test failures with This leaves me thinking that maybe, this failure you observed is related to some other raciness that somehow leaves (or puts) data after the pack itself. Maybe it's related to the platform as well, and it would be interesting to know if this reproduces reliably for you, maybe even on different platforms. |
I dug in yesterday but forgot to share my findings. I could reproduce it with:
Ah, something similar happened to me where
Interesting, I can rerun again a few times. I think I'll ask some folks to try it out too since we're all running a host of different Linux OSes! |
@Byron can you push a version that includes the assertions? I just realised that the current version is still the one with the sink. |
I could have, but didn't want to keep spamming force-pushes which also trigger notifications, and instead opted for providing a patch which is reproduced here: diff --git a/radicle-fetch/src/transport/fetch.rs b/radicle-fetch/src/transport/fetch.rs
index 5dfc1535..4558df49 100644
--- a/radicle-fetch/src/transport/fetch.rs
+++ b/radicle-fetch/src/transport/fetch.rs
@@ -269,7 +269,7 @@ where
);
// Consume anything that might still be left on the wire - this is 'EOF' most of the time,
// but some tests have 'garbage' here as well.
- std::io::copy(&mut reader, &mut std::io::sink())?;
+ assert!(reader.peek_data_line().is_none());
assert_eq!(
reader.stopped_at(),
Some(MessageKind::Flush), If you want me to force-push nonetheless or push a separate commit for this change, I will of course (it's just that the current version is the known good one that I didn't want to meddle with for trying things). |
The patch works :) I meant just push a separate branch though 😁 |
Alright, I just pushed a new commit then that is for later removal, knowing that I now caused the opposite of what I originally intended: to reduce notifications 😅. By the way, please do feel free to push or force-push directly into this branch if it helps with testing in any way, no need to go through me. Reviews are much faster and easier that way I find. When checked out with |
Thanks for that! So I got some help from colleagues, and you can see the thread here https://radicle.zulipchat.com/#narrow/stream/369277-heartwood/topic/debugging.20gitoxide.20packets I realised that you might not have seen errors because if You can see, in the thread, that rudolfs is running MacOS and recent git version runs into the panic by running Lars also ends up getting the panic on Debian. |
I was also running the tests like this to bypass At this point I think one could do all or any of the following:
Maybe something else should be done as well - just let me know where we are heading :). |
Why not both 😛 I'm happy to take the 'good' version and merge it in :) I'll push the patch via Radicle and can add the link back to here. Tagging @rudolfs to see if he has time for checking out the leftover bytes. I'm also happy to do it on my system too 👍 |
539f96b
to
4ace976
Compare
'All of the following' it is :). I just have reverted the commit with test-code and have a patch here which does show the remaining bytes left in the channel (I only see an empty string there though). diff --git a/radicle-fetch/src/transport/fetch.rs b/radicle-fetch/src/transport/fetch.rs
index 5dfc1535..5b45e927 100644
--- a/radicle-fetch/src/transport/fetch.rs
+++ b/radicle-fetch/src/transport/fetch.rs
@@ -1,3 +1,4 @@
+use bstr::ByteSlice;
use std::{
borrow::Cow,
io::{self, BufRead},
@@ -269,7 +270,9 @@ where
);
// Consume anything that might still be left on the wire - this is 'EOF' most of the time,
// but some tests have 'garbage' here as well.
- std::io::copy(&mut reader, &mut std::io::sink())?;
+ let mut buf = Vec::new();
+ std::io::copy(&mut reader, &mut buf)?;
+ dbg!(buf.as_bstr());
assert_eq!(
reader.stopped_at(),
Some(MessageKind::Flush), I hope that helps to get closer to solving that riddle 😅. Maybe contributing here isn't the preferred way of doing so, is there something I missed on how to do the same on Radicle? What I am missing in particular is CI as I can't really gauge the quality of my commits while there are local test-failures. |
So I was trying to use the let mut buf = Vec::new();
assert!(reader.peek_data_line().is_none(), "buf={:?}", {
std::io::copy(&mut reader, &mut buf)?;
buf
}); Here's the interesting part...
So it seems like |
Aye, we create patches using the I've created a patch. Unfortunately, we don't have review and conversational elements yet -- that happens on Zulip curently. |
Thanks for trying the patch, and for the pointer for trying the
If the code presented in the comment is the one to draw that conclusion from, then I have a different view on it. This seems to indicate though that having bytes left in there is racy? Maybe @rudolfs has more luck with it, seeing the bytes might help to learn where they are from. And since there now is the Radicle version of this PR I think I will close this :). |
Perhaps I'm wrong, but the
Are you saying that the |
I enthusiastically exhale that I got this wrong. What I described is what I thought it would be, and what you described is what it actually is. When it encounters EOF during peek, it doesn't return Maybe the 'extra bytes' were that bug all along? |
Hehe, this happens to me all the time too. I'm thinking it might be that we were unnecessarily panicking due to the |
And digging into this, there are tests which want validate that behaviour, too. But running into this here clearly shows it should be improved, if only the documentation should make clear when to expect In the end, if there are no bytes available, there is we learn nothing new :/. |
Actually I have no explanation for this - it should return Maybe you could give it one last-ditch attempt and do Thanks again! |
Ok, I ran with assert_eq!(
reader.stopped_at(),
Some(MessageKind::Flush),
"the flush packet was now consumed"
); The output of the [radicle-fetch/src/transport/fetch.rs:273] reader.peek_data_line() = Some(
Ok(
Ok(
[
2,
84,
111,
116,
97,
108,
32,
53,
32,
40,
100,
101,
108,
116,
97,
32,
49,
41,
44,
32,
114,
101,
117,
115,
101,
100,
32,
48,
32,
40,
100,
101,
108,
116,
97,
32,
48,
41,
44,
32,
112,
97,
99,
107,
45,
114,
101,
117,
115,
101,
100,
32,
48,
],
),
),
) |
Can you imagine an archeologist who keeps digging, thinking it's going to be something big, and then all that's showing is a shard of a coffee mug, or something equally benign 😁? The playground revealed the decoded message is:
The At least that riddle is solved now, case closed :D. |
Note
gitoxide
: Suspect parsing of the packfile GitoxideLabs/gitoxide#972gix
: fix: Allow multiple packs to be received one after another. (#972) GitoxideLabs/gitoxide#1107The server implementation here is the first which actually used the capabilities of the V2
git protocol to perform multiple pack-fetches in a row through the same connection, which
isn't currently tested in
gitoxide
.Thus it ran into the issue that the pack-writer will read a pack 'perfectly', i.e. without
encountering EOF, which would be encoded by the FLUSH packet (
0000
on the wire).For that reason the command run after a pack was received would still see the flush packet
and immediately assume it's done.
As the pack-resolver for good reason doesn't know anything about this, it's up to the
fetch implementation to assure the transport has been drained of any remaining bytes.
Review Notes
gix
.This is an opportunity for improvements, but needs communication to figure out how this could look like.
cargo test -p radicle-node tests::e2e
failed before the fix (and after removing the previous fix), and works now, so I am optimistic nothing is broken by it.