support multiple fetches in a row #556

Byron · 2023-11-12T12:15:47Z

Note

Related issue in gitoxide: Suspect parsing of the packfile GitoxideLabs/gitoxide#972
Related PR with improvement in gix: fix: Allow multiple packs to be received one after another. (#972) GitoxideLabs/gitoxide#1107

The server implementation here is the first which actually used the capabilities of the V2
git protocol to perform multiple pack-fetches in a row through the same connection, which
isn't currently tested in gitoxide.

Thus it ran into the issue that the pack-writer will read a pack 'perfectly', i.e. without
encountering EOF, which would be encoded by the FLUSH packet (0000 on the wire).

For that reason the command run after a pack was received would still see the flush packet
and immediately assume it's done.

As the pack-resolver for good reason doesn't know anything about this, it's up to the
fetch implementation to assure the transport has been drained of any remaining bytes.

Review Notes

The implementation plays it very safe with 'all-assertions'. I can imagine this to be a bit too much.
I left a TODO to get rid fo the Delegate-style implementation which I failed to use myself in gix.
This is an opportunity for improvements, but needs communication to figure out how this could look like.
I didn't see CI kicking in, which I tried to check while this was a draft PR. However, locally on MacOS I have 61 failed tests, and validated that it didn't get worse. cargo test -p radicle-node tests::e2e failed before the fix (and after removing the previous fix), and works now, so I am optimistic nothing is broken by it.

FintanH · 2023-11-13T09:19:20Z

I didn't see CI kicking in, which I tried to check while this was a draft PR. However, locally on MacOS I have 61 failed tests, and validated that it didn't get worse. cargo test -p radicle-node tests::e2e failed before the fix (and after removing the previous fix), and works now, so I am optimistic nothing is broken by it.

What kind of failures were you running into?

On this branch if I run cargo test -p radilce-node -- --nocapture or cargo test -p radicle-cli -- --nocapture will fail by panicking at one of the assertions:

thread 'z6Mkveba9Mgj6YrpTJpm68YtrjAaZBaPzcjmGX6h7YhfBzD5 <worker#1>    ' panicked at 'the flush packet indicates there is nothing to peek', /home/haptop/Developer/heartwood/radicle-fetch/src/transport/fetch.rs:270:13

Byron · 2023-11-13T09:50:36Z

What kind of failures were you running into?

I think the source of all my 26 errors is Io(Error { kind: InvalidInput, message: "path must be shorter than SUN_LEN" }), and I didn't look into what that could mean yet. That's also the reason I didn't see the legitimate error, from what I can tell the SUN_LEN error happens before that when running cargo test -p radicle-cli -- --nocapture.

It's like I am missing some initial setup.

thread 'z6Mkveba9Mgj6YrpTJpm68YtrjAaZBaPzcjmGX6h7YhfBzD5 <worker#1>    ' panicked at 'the flush packet indicates there is nothing to peek', /home/haptop/Developer/heartwood/radicle-fetch/src/transport/fetch.rs:270:13

That's interesting, as this might mean something similar to what I encountered in the gitoxide test-suite with tests that deal with shallow clones. There it would end up seeing two bytes, 1 and 30 and didn't investigate what these might be.

This failure means there is more than expected left on the wire. In gitoxide I just ended up consuming all of it. For now I have adjusted the code blindly (but with working e2e tests) to implement that. However I plan to investigate the left-over bytes the one test is encountering once I can run it.

I have force-pushed the adjustment in case you want to give it another try.

Signed-off-by: Sebastian Thiel <[email protected]>

… a pack The pack-writer perfectly consumes all input of the pack without overshoot, which it can as it knows how many objects it ought to read. Thus, the last packetline of the pack is consumed to the last byte without ever asking for more data that would then be denied as the flush packet is encountered. This needs to be accounted for in the calling code. Signed-off-by: Sebastian Thiel <[email protected]>

FintanH · 2023-11-13T13:56:12Z

radicle-fetch/src/transport/fetch.rs

+            );
+            // Consume anything that might still be left on the wire - this is 'EOF' most of the time,
+            // but some tests have 'garbage' here as well.
+            std::io::copy(&mut reader, &mut std::io::sink())?;


I suppose this is quite similar to how I was doing it before, except we're not bothering to check if it's the flush packet or not.

fwiw, I tried running the test suite with cargo test --all and it worked 👍

Good to hear!

Actually it doesn't seem to always be the flush package, otherwise there wouldn't have been that one test failure. Once my tests are working, I will look at the contents that it's copying, maybe it's the same 'garbage' bytes that I have been seeing.

FintanH · 2023-11-13T13:59:02Z

I think the source of all my 26 errors is Io(Error { kind: InvalidInput, message: "path must be shorter than SUN_LEN" }), and I didn't look into what that could mean yet. That's also the reason I didn't see the legitimate error, from what I can tell the SUN_LEN error happens before that when running cargo test -p radicle-cli -- --nocapture.

Hmmmm what OS are you using? From a quick look around it seems to be the Radicle socket paths are too long on your system https://unix.stackexchange.com/questions/367008/why-is-socket-path-length-limited-to-a-hundred-chars

Byron · 2023-11-13T15:21:13Z

It looks like BSDs, and so MacOS, have a 104 character limit. I am on MacOS 14.1.1, so probably it's that. Is there a chance to make the socket path shorter on MacOS?

I investigated real quick and it does turn out that setting RAD_SOCKET=/tmp/rad makes more tests pass. Those who don't are unhappy that the socket is already running. Strangely, even with when making tests single-threaded, it still fails like that as if the socket-device/file isn't removed after the test runs.

This is what I tried:

RAD_SOCKET=/tmp/rad_socket cargo test --all -- --test-threads=1

Which fails with:

'test_replication_via_seed' panicked at 'called `Result::unwrap()` on an `Err` value: AlreadyRunning("/tmp/rad_socket")', /Users/byron/dev/github.com/radicle-dev/heartwood/radicle-node/src/test/environment.rs:379:10

Even when running the tests with RAD_SOCKET=/tmp/rad_socket cargo nextest run -j1 --no-fail-fast, which typically does a better job at coherently running things, it still stumbles over the socket.

When looking at a detailed error log, at least it becomes clear why the sockets are too long - I presume it would create them in the temporary directory…

running 1 test
test: Node::init alice: z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y
test: Node::init bob: z6MkpLDPEXJAdgnZ2xLsJFotNNZCw4CcFEgddpZhRWzT8g6E
1699888427 tests::e2e::test_replication_ref_in_sigrefs radicle_cob::change_graph: loading object 'xyz.radicle.id' '7d16f93be9371c051505352f34c150fe5d12bdc9'
1699888427 tests::e2e::test_replication_ref_in_sigrefs   storage: Setting ref: refs/rad/id -> 7d16f93be9371c051505352f34c150fe5d12bdc9
1699888427 tests::e2e::test_replication_ref_in_sigrefs   storage: Setting ref: refs/heads/master -> f2de534b5e81d7c6e2dcaf58c3dd91573c0a0354
1699888427 tests::e2e::test_replication_ref_in_sigrefs   storage: Setting ref: HEAD -> refs/heads/master
test: Initialized project rad:z2wxtY8oFfoNhJrX84GKCEqmnjo5j for node z6MkpLDPEXJAdgnZ2xLsJFotNNZCw4CcFEgddpZhRWzT8g6E
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening address book /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6/node/addresses.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening gossip store from /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6/node/addresses.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening routing table /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6/node/routing.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening tracking policy table /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6/node/tracking.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Default tracking policy set to 'block'
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Initializing service (Test)..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Address book is empty. Adding bootstrap nodes..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: 0 nodes added to address book
1699888427 tests::e2e::test_replication_ref_in_sigrefs   service: Init @1699888427000
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Listening on 0.0.0.0:59216..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <runtime>           node: Running node z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y in /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <runtime>           node: Binding control socket /tmp/rad_socket..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <control>        control: Control thread listening on socket..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening address book /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/addresses.db..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <service>        service: Received command QueryState(..)
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening gossip store from /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/addresses.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening routing table /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/routing.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening tracking policy table /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/tracking.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Default tracking policy set to 'block'
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Initializing service (Test)..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Address book is empty. Adding bootstrap nodes..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: 0 nodes added to address book
1699888427 tests::e2e::test_replication_ref_in_sigrefs   service: Init @1699888427000
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Listening on 0.0.0.0:59217..
test: Node z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y shutting down..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <control>        control: Received `{"type":"shutdown"}` on control socket
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <control>        control: Shutdown requested..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <runtime>           pool: Worker pool shutting down..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <runtime>           node: Node shutdown completed for z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y
test tests::e2e::test_replication_ref_in_sigrefs ... FAILED

failures:

failures:
    tests::e2e::test_replication_ref_in_sigrefs

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 93 filtered out; finished in 0.13s


--- STDERR:              radicle-node tests::e2e::test_replication_ref_in_sigrefs ---
thread 'tests::e2e::test_replication_ref_in_sigrefs' panicked at 'called `Result::unwrap()` on an `Err` value: AlreadyRunning("/tmp/rad_socket")', radicle-node/src/test/environment.rs:379:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

… /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/a which seems pretty long indeed :).

To make this work, I think on MacOS there would have to be a special case at least for the tests - would you like me to implement that as well? Happy to contribute this as I really do want to run the tests successfully :).

sebastinez · 2023-11-14T06:20:56Z

Hi @Byron, so I was also running into this issue on MacOS and I solved it by defining the TMPDIR variable to make the path shorter.

Byron · 2023-11-14T19:03:41Z

Thanks for the hint, @sebastinez, thanks to it I am now down to 3 failures which seem (more) legit. But since you are running MacOS as well I wonder why you (probably) don't see them.

--- STDERR:              radicle-cli::commands rad_diff ---
thread 'rad_diff' panicked at 'Expected success, was 1
stderr:
sed: 1: "main.c": invalid command code m

', radicle-cli-test/src/lib.rs:472:41
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

--- STDERR:              radicle-cli::commands rad_patch_pull_update ---
thread 'rad_patch_pull_update' panicked at '
--- Expected
++++ actual:   stdout
   1      - {"type":"refsSynced","remote":"z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi","rid":"rad:zhbMU4DUXrzB8xT6qAJh6yZ7bFMK"}
        1 + ✗ Error: timed out reading from control socket

Exit status: 1
', radicle-cli-test/src/lib.rs:468:36

--- STDERR:              radicle-httpd api::v1::stats::routes::test_stats ---
thread 'api::v1::stats::routes::test_stats' panicked at 'called `Result::unwrap()` on an `Err` value: Git(Error { code: -1, klass: 4, message: "src refspec 'refs/heads/master' does not match any existing object" })', radicle-httpd/src/test.rs:194:6

The last one occours for 35 tests in radicle-httpd, and I only post this here for posterity as I am offering to look into making this work on MacOS if there is interest.

@FintanH, regarding the two failures of the previous version of the line that would assert with peek_data_line()…

diff --git a/radicle-fetch/src/transport/fetch.rs b/radicle-fetch/src/transport/fetch.rs
index 5dfc1535..4558df49 100644
--- a/radicle-fetch/src/transport/fetch.rs
+++ b/radicle-fetch/src/transport/fetch.rs
@@ -269,7 +269,7 @@ where
             );
             // Consume anything that might still be left on the wire - this is 'EOF' most of the time,
             // but some tests have 'garbage' here as well.
-            std::io::copy(&mut reader, &mut std::io::sink())?;
+            assert!(reader.peek_data_line().is_none());
             assert_eq!(
                 reader.stopped_at(),
                 Some(MessageKind::Flush),

…are not reproducing for me. I only get my expected 37 errors, but none in radicle-node for example, where you definitely saw that assertion fail.

That's strange, except that maybe this is related to the git installed on the system, which probably still is the one building packs. I have this one, maybe yours is newer?

git version 2.39.3 (Apple Git-145)

When I learn about your git version, I can retry with that and see if it reproduces then.

sebastinez · 2023-11-14T22:20:08Z

Hey @Byron so I'm on git version

git version 2.42.1

But I just ran the test suite, long time ago since I ran it the last time, and I get the same errors you are getting.

FintanH · 2023-11-15T08:25:42Z

The last one occours for 35 tests in radicle-httpd, and I only post this here for posterity as I am offering to look into making this work on MacOS if there is interest.

For this case, it's an oversight in the test suite that I have a fix for here. Essentially, you have a different init.defaultBranch value than master and it was assumed in the test suite that it would be master.

Regarding this failure:

--- STDERR:              radicle-cli::commands rad_patch_pull_update ---
thread 'rad_patch_pull_update' panicked at '
--- Expected
++++ actual:   stdout
   1      - {"type":"refsSynced","remote":"z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi","rid":"rad:zhbMU4DUXrzB8xT6qAJh6yZ7bFMK"}
        1 + ✗ Error: timed out reading from control socket

Exit status: 1
', radicle-cli-test/src/lib.rs:468:36

I actually witnessed this running on my Mac recently. We have no fix and it's an annoying race condition, so in this case it can be ignored.

However, this one is new and I'd need to dig into what's going on:

--- STDERR:              radicle-cli::commands rad_diff ---
thread 'rad_diff' panicked at 'Expected success, was 1
stderr:
sed: 1: "main.c": invalid command code m

', radicle-cli-test/src/lib.rs:472:41
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

When I learn about your git version, I can retry with that and see if it reproduces then.

Indeed, I am on a newer version of git:

git version 2.40.1

Byron · 2023-11-15T09:20:51Z

For this case, it's an oversight in the test suite that I have a fix for here. Essentially, you have a different init.defaultBranch value than master and it was assumed in the test suite that it would be master.

Ah, good to hear it's something trivial. In the gitoxide test-suite, I do my best to prevent git from picking up anything more than what's local to the repository at hand, and to control a few key variables - these have accumulated over time and might be useful here as well. Here is how this looks like right now: https://github.com/Byron/gitoxide/blob/072ee32f693a31161cd6a843da6582d13efbb20b/tests/tools/src/lib.rs#L500-L524

I actually witnessed this running on my Mac recently. We have no fix and it's an annoying race condition, so in this case it can be ignored.

Ok, I will keep that in mind. Thus far, it was so consistent though that I wasn't able to identify it as such though. There are at least two other tests, one with concurrency in it's name if I recall correctly, that can deadlock. Those I ignored, thinking that maybe one day there is a solution for that as well. It took a while to get gitoxide's test-suite stable and it's so nice now (to not have flaky failures), I would wish you the same :).

However, this one is new and I'd need to dig into what's going on:

I dug in yesterday but forgot to share my findings. I could reproduce it with:

❯ sed -i 's/Hello World/Hello Radicle/' main.c
sed: 1: "main.c": invalid command code m

…and I think the MacOS fix for 'edit in place - save no backup' is:

❯ sed -I '' 's/Hello World/Hello Radicle/' main.c
sed: main.c: No such file or directory

Of course I wouldn't know yet how to make such a change in the sandbox test description except for copying it entirely, and will leave the fix to you.

Regarding you seeing two test failures with peek_data_line() where I see none, I did retry with the latest git compiled from source, v2.43.0rc2, but still could not reproduce the issue. To be sure I replaced that executable with a script that always fails, and indeed, now 24 out of 75 tests in radicle-node would fail as they definitely picked up the git binary I put in the PATH.

This leaves me thinking that maybe, this failure you observed is related to some other raciness that somehow leaves (or puts) data after the pack itself. Maybe it's related to the platform as well, and it would be interesting to know if this reproduces reliably for you, maybe even on different platforms.

FintanH · 2023-11-15T09:37:56Z

I dug in yesterday but forgot to share my findings. I could reproduce it with:

❯ sed -i 's/Hello World/Hello Radicle/' main.c
sed: 1: "main.c": invalid command code m

…and I think the MacOS fix for 'edit in place - save no backup' is:

❯ sed -I '' 's/Hello World/Hello Radicle/' main.c
sed: main.c: No such file or directory

Ah, something similar happened to me where ls would show hidden files on MacOS if you were the superuser 🤦 Seems like we need to think through cross-platform testing for the CLI code.

This leaves me thinking that maybe, this failure you observed is related to some other raciness that somehow leaves (or puts) data after the pack itself. Maybe it's related to the platform as well, and it would be interesting to know if this reproduces reliably for you, maybe even on different platforms.

Interesting, I can rerun again a few times. I think I'll ask some folks to try it out too since we're all running a host of different Linux OSes!

FintanH · 2023-11-15T11:03:41Z

@Byron can you push a version that includes the assertions? I just realised that the current version is still the one with the sink.

Byron · 2023-11-15T11:17:05Z

I could have, but didn't want to keep spamming force-pushes which also trigger notifications, and instead opted for providing a patch which is reproduced here:

diff --git a/radicle-fetch/src/transport/fetch.rs b/radicle-fetch/src/transport/fetch.rs
index 5dfc1535..4558df49 100644
--- a/radicle-fetch/src/transport/fetch.rs
+++ b/radicle-fetch/src/transport/fetch.rs
@@ -269,7 +269,7 @@ where
             );
             // Consume anything that might still be left on the wire - this is 'EOF' most of the time,
             // but some tests have 'garbage' here as well.
-            std::io::copy(&mut reader, &mut std::io::sink())?;
+            assert!(reader.peek_data_line().is_none());
             assert_eq!(
                 reader.stopped_at(),
                 Some(MessageKind::Flush),

If you want me to force-push nonetheless or push a separate commit for this change, I will of course (it's just that the current version is the known good one that I didn't want to meddle with for trying things).

FintanH · 2023-11-15T11:18:23Z

The patch works :) I meant just push a separate branch though 😁

Byron · 2023-11-15T11:27:27Z

Alright, I just pushed a new commit then that is for later removal, knowing that I now caused the opposite of what I originally intended: to reduce notifications 😅.

By the way, please do feel free to push or force-push directly into this branch if it helps with testing in any way, no need to go through me. Reviews are much faster and easier that way I find. When checked out with gh pr checkout 556 changes can be pushed back without doing anything special.

FintanH · 2023-11-15T14:18:33Z

Thanks for that! So I got some help from colleagues, and you can see the thread here https://radicle.zulipchat.com/#narrow/stream/369277-heartwood/topic/debugging.20gitoxide.20packets

I realised that you might not have seen errors because if radicle-cli fails, for some reason, radicle-node tests will not run. Perhaps you could try run cargo test -p radicle-node with the assertion.

You can see, in the thread, that rudolfs is running MacOS and recent git version runs into the panic by running cargo test -p radicle-node.

Lars also ends up getting the panic on Debian.

Byron · 2023-11-15T18:22:51Z

Thanks for that! So I got some help from colleagues, and you can see the thread here https://radicle.zulipchat.com/#narrow/stream/369277-heartwood/topic/debugging.20gitoxide.20packets

I realised that you might not have seen errors because if radicle-cli fails, for some reason, radicle-node tests will not run. Perhaps you could try run cargo test -p radicle-node with the assertion.

You can see, in the thread, that rudolfs is running MacOS and recent git version runs into the panic by running cargo test -p radicle-node.

I was also running the tests like this to bypass radicle-cli failures initially, but could run everything together using TMPDIR=/tmp cargo nextest run --no-fail-fast --workspace, or individually. Unfortunately, with the patch applied, I never managed to reproduce the issue.

At this point I think one could do all or any of the following:

ignore the spurious failures and make what's here mergeable, we know the 'good' version
provide a patch to rudolfs so they can share which bytes are still in the channel (which is interesting to me as I'd hope it's 1 and 30 in decimal)

Maybe something else should be done as well - just let me know where we are heading :).

FintanH · 2023-11-16T09:46:30Z

At this point I think one could do all or any of the following:
* ignore the spurious failures and make what's here mergeable, we know the 'good' version

* provide a patch to rudolfs so they can share which bytes are still in the channel (which is interesting to me as I'd hope it's 1 and 30 in decimal)
Maybe something else should be done as well - just let me know where we are heading :).

Why not both 😛

I'm happy to take the 'good' version and merge it in :) I'll push the patch via Radicle and can add the link back to here.

Tagging @rudolfs to see if he has time for checking out the leftover bytes. I'm also happy to do it on my system too 👍

Byron · 2023-11-16T10:14:43Z

'All of the following' it is :).

I just have reverted the commit with test-code and have a patch here which does show the remaining bytes left in the channel (I only see an empty string there though).

diff --git a/radicle-fetch/src/transport/fetch.rs b/radicle-fetch/src/transport/fetch.rs
index 5dfc1535..5b45e927 100644
--- a/radicle-fetch/src/transport/fetch.rs
+++ b/radicle-fetch/src/transport/fetch.rs
@@ -1,3 +1,4 @@
+use bstr::ByteSlice;
 use std::{
     borrow::Cow,
     io::{self, BufRead},
@@ -269,7 +270,9 @@ where
             );
             // Consume anything that might still be left on the wire - this is 'EOF' most of the time,
             // but some tests have 'garbage' here as well.
-            std::io::copy(&mut reader, &mut std::io::sink())?;
+            let mut buf = Vec::new();
+            std::io::copy(&mut reader, &mut buf)?;
+            dbg!(buf.as_bstr());
             assert_eq!(
                 reader.stopped_at(),
                 Some(MessageKind::Flush),

I hope that helps to get closer to solving that riddle 😅.

Maybe contributing here isn't the preferred way of doing so, is there something I missed on how to do the same on Radicle? What I am missing in particular is CI as I can't really gauge the quality of my commits while there are local test-failures.

FintanH · 2023-11-16T10:49:40Z

So I was trying to use the dbg patch and it was constantly showing up as empty. So I tried to dig in a bit deeper and used the following:

            let mut buf = Vec::new();
            assert!(reader.peek_data_line().is_none(), "buf={:?}", {
                std::io::copy(&mut reader, &mut buf)?;
                buf
            });

Here's the interesting part...

thread 'z6Mkt67GdsW7715MEfRuP4pSZxJRJh6kj6Y48WRqVv4N1tRk <worker#1>    ' panicked at 'buf=[]', /home/haptop/Developer/heartwood/radicle-fetch/src/transport/fetch.rs:274:13

So it seems like peek_data_line can return Some with no data!

FintanH · 2023-11-16T10:51:31Z

Maybe contributing here isn't the preferred way of doing so, is there something I missed on how to do the same on Radicle? What I am missing in particular is CI as I can't really gauge the quality of my commits while there are local test-failures.

Aye, we create patches using the rad command-line tool which you can pick up here: https://radicle.xyz/#try

I've created a patch. Unfortunately, we don't have review and conversational elements yet -- that happens on Zulip curently.

Byron · 2023-11-16T11:48:28Z

Thanks for trying the patch, and for the pointer for trying the rad tool! I'd definitely love to try it particularly when conversation is available with it, and will be trying the rad + zulip workflow for the next contribution as well.

So it seems like peek_data_line can return Some with no data!

If the code presented in the comment is the one to draw that conclusion from, then I have a different view on it. peek_data_line() returned None which means it didn't see any data (due to the flush packet). When using std::io::copy() from a source that is EOF, which is what the flush packet does, it comes out empty as well. So all there is to see is that there is nothing, or EOF by flush packet.

This seems to indicate though that having bytes left in there is racy? Maybe @rudolfs has more luck with it, seeing the bytes might help to learn where they are from.

And since there now is the Radicle version of this PR I think I will close this :).

FintanH · 2023-11-16T11:58:49Z

peek_data_line() returned None which means it didn't see any data (due to the flush packet).

Perhaps I'm wrong, but the assert! will panic when peek_data_line is Some, since we're saying that we expect it to be None, i.e. is_none. So, peek_data_line is Some and the next argument to assert! will execute, copying whatever bytes are in the reader to the buf. Those copied bytes are empty, judging by the buf=[].

When using std::io::copy() from a source that is EOF, which is what the flush packet does, it comes out empty as well. So all there is to see is that there is nothing, or EOF by flush packet.

Are you saying that the buf=[] indicates that the flush packet is leftover?

Byron · 2023-11-16T12:06:16Z

I enthusiastically exhale that I got this wrong. What I described is what I thought it would be, and what you described is what it actually is.
Yes, it's indeed funny and I think it's a bug which is 'implemented' here: https://github.com/Byron/gitoxide/blob/c3edef1c0c49accbb037bdf086dade3ed0e5e507/gix-packetline/src/read/blocking_io.rs#L152-L154 .

When it encounters EOF during peek, it doesn't return None but the then empty buffer, which explains why it's some. I should be able to add a test for 'peek EOF' to fix it.

Maybe the 'extra bytes' were that bug all along?

FintanH · 2023-11-16T12:15:35Z

I enthusiastically exhale that I got this wrong. What I described is what I thought it would be, and what you described is what it actually is.

Hehe, this happens to me all the time too.

I'm thinking it might be that we were unnecessarily panicking due to the is_none check when indeed we were finished reading :)

Byron · 2023-11-16T12:18:11Z

And digging into this, there are tests which want validate that behaviour, too. But running into this here clearly shows it should be improved, if only the documentation should make clear when to expect None.

In the end, if there are no bytes available, there is we learn nothing new :/.

Byron · 2023-11-16T12:36:27Z

Actually I have no explanation for this - it should return None on EOF and it does return it, too, there are tests for that and I even added a specific one to be very sure.

Maybe you could give it one last-ditch attempt and do dbg!(peek_data_line()), as the Some() might contain an error. If not, I could imagine it's actually empty, which can happen if the server sends 0004 I presume, even though this is actively discouraged.

Thanks again!

FintanH · 2023-11-16T12:56:07Z

Ok, I ran with dbg!(reader.peek_data_line()) and funnily enough this line panicked in a couple of tests:

assert_eq!(
                reader.stopped_at(),
                Some(MessageKind::Flush),
                "the flush packet was now consumed"
            );

The output of the dbg for those were:

[radicle-fetch/src/transport/fetch.rs:273] reader.peek_data_line() = Some(
    Ok(
        Ok(
            [
                2,
                84,
                111,
                116,
                97,
                108,
                32,
                53,
                32,
                40,
                100,
                101,
                108,
                116,
                97,
                32,
                49,
                41,
                44,
                32,
                114,
                101,
                117,
                115,
                101,
                100,
                32,
                48,
                32,
                40,
                100,
                101,
                108,
                116,
                97,
                32,
                48,
                41,
                44,
                32,
                112,
                97,
                99,
                107,
                45,
                114,
                101,
                117,
                115,
                101,
                100,
                32,
                48,
            ],
        ),
    ),
)

Byron · 2023-11-16T14:21:48Z

Can you imagine an archeologist who keeps digging, thinking it's going to be something big, and then all that's showing is a shard of a coffee mug, or something equally benign 😁?

The playground revealed the decoded message is:

"\u{2}Total 5 (delta 1), reused 0 (delta 0), pack-reused 0"

The 2 means it's a sideband message for stderr, which is typically used for progress messages while receiving/waiting for the pack. These definitely happen out of order and apparently they can also arrive a little late, probably particularly for small packs that are basically just a single packetline. The progress-handler is set for the reader, but it's not triggering while peeking as that would be a consuming action, which is why it's visible in that way. This also explains why it's empty when actually reading from it - it's progress after all, not data.

At least that riddle is solved now, case closed :D.

Byron marked this pull request as ready for review November 12, 2023 12:16

Byron mentioned this pull request Nov 12, 2023

Suspect parsing of the packfile GitoxideLabs/gitoxide#972

Closed

1 task

Byron force-pushed the fix-fetch-issue branch from 0e70df4 to 03ecfdb Compare November 13, 2023 09:50

Byron added 2 commits November 13, 2023 13:52

fetch: update gix dependencies to the latest version

95e9c13

Signed-off-by: Sebastian Thiel <[email protected]>

Byron force-pushed the fix-fetch-issue branch from 03ecfdb to 4ace976 Compare November 13, 2023 12:53

FintanH reviewed Nov 13, 2023

View reviewed changes

Byron force-pushed the fix-fetch-issue branch from 539f96b to 4ace976 Compare November 16, 2023 10:07

Byron closed this Nov 16, 2023

Byron deleted the fix-fetch-issue branch November 16, 2023 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support multiple fetches in a row #556

support multiple fetches in a row #556

Byron commented Nov 12, 2023 •

edited

Loading

FintanH commented Nov 13, 2023

Byron commented Nov 13, 2023

FintanH Nov 13, 2023

Byron Nov 13, 2023

FintanH commented Nov 13, 2023

Byron commented Nov 13, 2023

sebastinez commented Nov 14, 2023

Byron commented Nov 14, 2023

sebastinez commented Nov 14, 2023

FintanH commented Nov 15, 2023

Byron commented Nov 15, 2023

FintanH commented Nov 15, 2023

FintanH commented Nov 15, 2023

Byron commented Nov 15, 2023

FintanH commented Nov 15, 2023

Byron commented Nov 15, 2023

FintanH commented Nov 15, 2023

Byron commented Nov 15, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

FintanH commented Nov 16, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

Byron commented Nov 16, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

support multiple fetches in a row #556

support multiple fetches in a row #556

Conversation

Byron commented Nov 12, 2023 • edited Loading

Review Notes

FintanH commented Nov 13, 2023

Byron commented Nov 13, 2023

FintanH Nov 13, 2023

Choose a reason for hiding this comment

Byron Nov 13, 2023

Choose a reason for hiding this comment

FintanH commented Nov 13, 2023

Byron commented Nov 13, 2023

sebastinez commented Nov 14, 2023

Byron commented Nov 14, 2023

sebastinez commented Nov 14, 2023

FintanH commented Nov 15, 2023

Byron commented Nov 15, 2023

FintanH commented Nov 15, 2023

FintanH commented Nov 15, 2023

Byron commented Nov 15, 2023

FintanH commented Nov 15, 2023

Byron commented Nov 15, 2023

FintanH commented Nov 15, 2023

Byron commented Nov 15, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

FintanH commented Nov 16, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

Byron commented Nov 16, 2023

FintanH commented Nov 16, 2023

Byron commented Nov 16, 2023

Byron commented Nov 12, 2023 •

edited

Loading