Skip to content
This repository has been archived by the owner on Apr 29, 2024. It is now read-only.

support multiple fetches in a row #556

Closed
wants to merge 2 commits into from

Conversation

Byron
Copy link
Contributor

@Byron Byron commented Nov 12, 2023

The server implementation here is the first which actually used the capabilities of the V2
git protocol to perform multiple pack-fetches in a row through the same connection, which
isn't currently tested in gitoxide.

Thus it ran into the issue that the pack-writer will read a pack 'perfectly', i.e. without
encountering EOF, which would be encoded by the FLUSH packet (0000 on the wire).

For that reason the command run after a pack was received would still see the flush packet
and immediately assume it's done.

As the pack-resolver for good reason doesn't know anything about this, it's up to the
fetch implementation to assure the transport has been drained of any remaining bytes.

Review Notes

  • The implementation plays it very safe with 'all-assertions'. I can imagine this to be a bit too much.
  • I left a TODO to get rid fo the Delegate-style implementation which I failed to use myself in gix.
    This is an opportunity for improvements, but needs communication to figure out how this could look like.
  • I didn't see CI kicking in, which I tried to check while this was a draft PR. However, locally on MacOS I have 61 failed tests, and validated that it didn't get worse. cargo test -p radicle-node tests::e2e failed before the fix (and after removing the previous fix), and works now, so I am optimistic nothing is broken by it.

@Byron Byron marked this pull request as ready for review November 12, 2023 12:16
@FintanH
Copy link
Contributor

FintanH commented Nov 13, 2023

I didn't see CI kicking in, which I tried to check while this was a draft PR. However, locally on MacOS I have 61 failed tests, and validated that it didn't get worse. cargo test -p radicle-node tests::e2e failed before the fix (and after removing the previous fix), and works now, so I am optimistic nothing is broken by it.

What kind of failures were you running into?

On this branch if I run cargo test -p radilce-node -- --nocapture or cargo test -p radicle-cli -- --nocapture will fail by panicking at one of the assertions:

thread 'z6Mkveba9Mgj6YrpTJpm68YtrjAaZBaPzcjmGX6h7YhfBzD5 <worker#1>    ' panicked at 'the flush packet indicates there is nothing to peek', /home/haptop/Developer/heartwood/radicle-fetch/src/transport/fetch.rs:270:13

@Byron
Copy link
Contributor Author

Byron commented Nov 13, 2023

What kind of failures were you running into?

I think the source of all my 26 errors is Io(Error { kind: InvalidInput, message: "path must be shorter than SUN_LEN" }), and I didn't look into what that could mean yet. That's also the reason I didn't see the legitimate error, from what I can tell the SUN_LEN error happens before that when running cargo test -p radicle-cli -- --nocapture.

It's like I am missing some initial setup.

thread 'z6Mkveba9Mgj6YrpTJpm68YtrjAaZBaPzcjmGX6h7YhfBzD5 <worker#1>    ' panicked at 'the flush packet indicates there is nothing to peek', /home/haptop/Developer/heartwood/radicle-fetch/src/transport/fetch.rs:270:13

That's interesting, as this might mean something similar to what I encountered in the gitoxide test-suite with tests that deal with shallow clones. There it would end up seeing two bytes, 1 and 30 and didn't investigate what these might be.

This failure means there is more than expected left on the wire. In gitoxide I just ended up consuming all of it. For now I have adjusted the code blindly (but with working e2e tests) to implement that. However I plan to investigate the left-over bytes the one test is encountering once I can run it.

I have force-pushed the adjustment in case you want to give it another try.

… a pack

The pack-writer perfectly consumes all input of the pack without
overshoot,  which it can as it knows how many objects it ought
to read.

Thus, the last packetline of the pack is consumed to the last
byte without ever asking for more data that would then be denied
as the flush packet is encountered.

This needs to be accounted for in the calling code.

Signed-off-by: Sebastian Thiel <[email protected]>
);
// Consume anything that might still be left on the wire - this is 'EOF' most of the time,
// but some tests have 'garbage' here as well.
std::io::copy(&mut reader, &mut std::io::sink())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is quite similar to how I was doing it before, except we're not bothering to check if it's the flush packet or not.

fwiw, I tried running the test suite with cargo test --all and it worked 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to hear!

Actually it doesn't seem to always be the flush package, otherwise there wouldn't have been that one test failure. Once my tests are working, I will look at the contents that it's copying, maybe it's the same 'garbage' bytes that I have been seeing.

@FintanH
Copy link
Contributor

FintanH commented Nov 13, 2023

I think the source of all my 26 errors is Io(Error { kind: InvalidInput, message: "path must be shorter than SUN_LEN" }), and I didn't look into what that could mean yet. That's also the reason I didn't see the legitimate error, from what I can tell the SUN_LEN error happens before that when running cargo test -p radicle-cli -- --nocapture.

Hmmmm what OS are you using? From a quick look around it seems to be the Radicle socket paths are too long on your system https://unix.stackexchange.com/questions/367008/why-is-socket-path-length-limited-to-a-hundred-chars

@Byron
Copy link
Contributor Author

Byron commented Nov 13, 2023

It looks like BSDs, and so MacOS, have a 104 character limit. I am on MacOS 14.1.1, so probably it's that. Is there a chance to make the socket path shorter on MacOS?

I investigated real quick and it does turn out that setting RAD_SOCKET=/tmp/rad makes more tests pass. Those who don't are unhappy that the socket is already running. Strangely, even with when making tests single-threaded, it still fails like that as if the socket-device/file isn't removed after the test runs.

This is what I tried:

RAD_SOCKET=/tmp/rad_socket cargo test --all -- --test-threads=1

Which fails with:

'test_replication_via_seed' panicked at 'called `Result::unwrap()` on an `Err` value: AlreadyRunning("/tmp/rad_socket")', /Users/byron/dev/github.com/radicle-dev/heartwood/radicle-node/src/test/environment.rs:379:10

Even when running the tests with RAD_SOCKET=/tmp/rad_socket cargo nextest run -j1 --no-fail-fast, which typically does a better job at coherently running things, it still stumbles over the socket.

When looking at a detailed error log, at least it becomes clear why the sockets are too long - I presume it would create them in the temporary directory…

running 1 test
test: Node::init alice: z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y
test: Node::init bob: z6MkpLDPEXJAdgnZ2xLsJFotNNZCw4CcFEgddpZhRWzT8g6E
1699888427 tests::e2e::test_replication_ref_in_sigrefs radicle_cob::change_graph: loading object 'xyz.radicle.id' '7d16f93be9371c051505352f34c150fe5d12bdc9'
1699888427 tests::e2e::test_replication_ref_in_sigrefs   storage: Setting ref: refs/rad/id -> 7d16f93be9371c051505352f34c150fe5d12bdc9
1699888427 tests::e2e::test_replication_ref_in_sigrefs   storage: Setting ref: refs/heads/master -> f2de534b5e81d7c6e2dcaf58c3dd91573c0a0354
1699888427 tests::e2e::test_replication_ref_in_sigrefs   storage: Setting ref: HEAD -> refs/heads/master
test: Initialized project rad:z2wxtY8oFfoNhJrX84GKCEqmnjo5j for node z6MkpLDPEXJAdgnZ2xLsJFotNNZCw4CcFEgddpZhRWzT8g6E
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening address book /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6/node/addresses.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening gossip store from /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6/node/addresses.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening routing table /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6/node/routing.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening tracking policy table /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6/node/tracking.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Default tracking policy set to 'block'
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Initializing service (Test)..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Address book is empty. Adding bootstrap nodes..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: 0 nodes added to address book
1699888427 tests::e2e::test_replication_ref_in_sigrefs   service: Init @1699888427000
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Listening on 0.0.0.0:59216..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <runtime>           node: Running node z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y in /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/hL7Ixmp6..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <runtime>           node: Binding control socket /tmp/rad_socket..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <control>        control: Control thread listening on socket..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening address book /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/addresses.db..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <service>        service: Received command QueryState(..)
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening gossip store from /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/addresses.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening routing table /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/routing.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Opening tracking policy table /private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/tracking.db..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Default tracking policy set to 'block'
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Initializing service (Test)..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Address book is empty. Adding bootstrap nodes..
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: 0 nodes added to address book
1699888427 tests::e2e::test_replication_ref_in_sigrefs   service: Init @1699888427000
1699888427 tests::e2e::test_replication_ref_in_sigrefs      node: Listening on 0.0.0.0:59217..
test: Node z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y shutting down..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <control>        control: Received `{"type":"shutdown"}` on control socket
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <control>        control: Shutdown requested..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <runtime>           pool: Worker pool shutting down..
1699888427 z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y <runtime>           node: Node shutdown completed for z6MksUg8HWbQavnDZv6QDmtTN7UFqdysZ8t3Bg3qPRPoU68y
test tests::e2e::test_replication_ref_in_sigrefs ... FAILED

failures:

failures:
    tests::e2e::test_replication_ref_in_sigrefs

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 93 filtered out; finished in 0.13s


--- STDERR:              radicle-node tests::e2e::test_replication_ref_in_sigrefs ---
thread 'tests::e2e::test_replication_ref_in_sigrefs' panicked at 'called `Result::unwrap()` on an `Err` value: AlreadyRunning("/tmp/rad_socket")', radicle-node/src/test/environment.rs:379:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

/private/var/folders/x7/jgy95vjs3v3ffszwn0wg0gz80000gn/T/.tmpt4bJiZ/rS4hfeIl/node/a which seems pretty long indeed :).

To make this work, I think on MacOS there would have to be a special case at least for the tests - would you like me to implement that as well? Happy to contribute this as I really do want to run the tests successfully :).

@sebastinez
Copy link
Member

Hi @Byron, so I was also running into this issue on MacOS and I solved it by defining the TMPDIR variable to make the path shorter.

@Byron
Copy link
Contributor Author

Byron commented Nov 14, 2023

Thanks for the hint, @sebastinez, thanks to it I am now down to 3 failures which seem (more) legit. But since you are running MacOS as well I wonder why you (probably) don't see them.

--- STDERR:              radicle-cli::commands rad_diff ---
thread 'rad_diff' panicked at 'Expected success, was 1
stderr:
sed: 1: "main.c": invalid command code m

', radicle-cli-test/src/lib.rs:472:41
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
--- STDERR:              radicle-cli::commands rad_patch_pull_update ---
thread 'rad_patch_pull_update' panicked at '
--- Expected
++++ actual:   stdout
   1      - {"type":"refsSynced","remote":"z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi","rid":"rad:zhbMU4DUXrzB8xT6qAJh6yZ7bFMK"}
        1 + ✗ Error: timed out reading from control socket

Exit status: 1
', radicle-cli-test/src/lib.rs:468:36
--- STDERR:              radicle-httpd api::v1::stats::routes::test_stats ---
thread 'api::v1::stats::routes::test_stats' panicked at 'called `Result::unwrap()` on an `Err` value: Git(Error { code: -1, klass: 4, message: "src refspec 'refs/heads/master' does not match any existing object" })', radicle-httpd/src/test.rs:194:6

The last one occours for 35 tests in radicle-httpd, and I only post this here for posterity as I am offering to look into making this work on MacOS if there is interest.


@FintanH, regarding the two failures of the previous version of the line that would assert with peek_data_line()

diff --git a/radicle-fetch/src/transport/fetch.rs b/radicle-fetch/src/transport/fetch.rs
index 5dfc1535..4558df49 100644
--- a/radicle-fetch/src/transport/fetch.rs
+++ b/radicle-fetch/src/transport/fetch.rs
@@ -269,7 +269,7 @@ where
             );
             // Consume anything that might still be left on the wire - this is 'EOF' most of the time,
             // but some tests have 'garbage' here as well.
-            std::io::copy(&mut reader, &mut std::io::sink())?;
+            assert!(reader.peek_data_line().is_none());
             assert_eq!(
                 reader.stopped_at(),
                 Some(MessageKind::Flush),

…are not reproducing for me. I only get my expected 37 errors, but none in radicle-node for example, where you definitely saw that assertion fail.

That's strange, except that maybe this is related to the git installed on the system, which probably still is the one building packs. I have this one, maybe yours is newer?

git version 2.39.3 (Apple Git-145)

When I learn about your git version, I can retry with that and see if it reproduces then.

@sebastinez
Copy link
Member

Hey @Byron so I'm on git version

git version 2.42.1

But I just ran the test suite, long time ago since I ran it the last time, and I get the same errors you are getting.

@FintanH
Copy link
Contributor

FintanH commented Nov 15, 2023

The last one occours for 35 tests in radicle-httpd, and I only post this here for posterity as I am offering to look into making this work on MacOS if there is interest.

For this case, it's an oversight in the test suite that I have a fix for here. Essentially, you have a different init.defaultBranch value than master and it was assumed in the test suite that it would be master.

Regarding this failure:

--- STDERR:              radicle-cli::commands rad_patch_pull_update ---
thread 'rad_patch_pull_update' panicked at '
--- Expected
++++ actual:   stdout
   1      - {"type":"refsSynced","remote":"z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi","rid":"rad:zhbMU4DUXrzB8xT6qAJh6yZ7bFMK"}
        1 + ✗ Error: timed out reading from control socket

Exit status: 1
', radicle-cli-test/src/lib.rs:468:36

I actually witnessed this running on my Mac recently. We have no fix and it's an annoying race condition, so in this case it can be ignored.

However, this one is new and I'd need to dig into what's going on:

--- STDERR:              radicle-cli::commands rad_diff ---
thread 'rad_diff' panicked at 'Expected success, was 1
stderr:
sed: 1: "main.c": invalid command code m

', radicle-cli-test/src/lib.rs:472:41
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

When I learn about your git version, I can retry with that and see if it reproduces then.

Indeed, I am on a newer version of git:

git version 2.40.1

@Byron
Copy link
Contributor Author

Byron commented Nov 15, 2023

For this case, it's an oversight in the test suite that I have a fix for here. Essentially, you have a different init.defaultBranch value than master and it was assumed in the test suite that it would be master.

Ah, good to hear it's something trivial. In the gitoxide test-suite, I do my best to prevent git from picking up anything more than what's local to the repository at hand, and to control a few key variables - these have accumulated over time and might be useful here as well. Here is how this looks like right now: https://github.com/Byron/gitoxide/blob/072ee32f693a31161cd6a843da6582d13efbb20b/tests/tools/src/lib.rs#L500-L524

I actually witnessed this running on my Mac recently. We have no fix and it's an annoying race condition, so in this case it can be ignored.

Ok, I will keep that in mind. Thus far, it was so consistent though that I wasn't able to identify it as such though. There are at least two other tests, one with concurrency in it's name if I recall correctly, that can deadlock. Those I ignored, thinking that maybe one day there is a solution for that as well. It took a while to get gitoxide's test-suite stable and it's so nice now (to not have flaky failures), I would wish you the same :).

However, this one is new and I'd need to dig into what's going on:

I dug in yesterday but forgot to share my findings. I could reproduce it with:

❯ sed -i 's/Hello World/Hello Radicle/' main.c
sed: 1: "main.c": invalid command code m

…and I think the MacOS fix for 'edit in place - save no backup' is:

❯ sed -I '' 's/Hello World/Hello Radicle/' main.c
sed: main.c: No such file or directory

Of course I wouldn't know yet how to make such a change in the sandbox test description except for copying it entirely, and will leave the fix to you.

Regarding you seeing two test failures with peek_data_line() where I see none, I did retry with the latest git compiled from source, v2.43.0rc2, but still could not reproduce the issue. To be sure I replaced that executable with a script that always fails, and indeed, now 24 out of 75 tests in radicle-node would fail as they definitely picked up the git binary I put in the PATH.

This leaves me thinking that maybe, this failure you observed is related to some other raciness that somehow leaves (or puts) data after the pack itself. Maybe it's related to the platform as well, and it would be interesting to know if this reproduces reliably for you, maybe even on different platforms.

@FintanH
Copy link
Contributor

FintanH commented Nov 15, 2023

I dug in yesterday but forgot to share my findings. I could reproduce it with:

❯ sed -i 's/Hello World/Hello Radicle/' main.c
sed: 1: "main.c": invalid command code m

…and I think the MacOS fix for 'edit in place - save no backup' is:

❯ sed -I '' 's/Hello World/Hello Radicle/' main.c
sed: main.c: No such file or directory

Ah, something similar happened to me where ls would show hidden files on MacOS if you were the superuser 🤦 Seems like we need to think through cross-platform testing for the CLI code.

This leaves me thinking that maybe, this failure you observed is related to some other raciness that somehow leaves (or puts) data after the pack itself. Maybe it's related to the platform as well, and it would be interesting to know if this reproduces reliably for you, maybe even on different platforms.

Interesting, I can rerun again a few times. I think I'll ask some folks to try it out too since we're all running a host of different Linux OSes!

@FintanH
Copy link
Contributor

FintanH commented Nov 15, 2023

@Byron can you push a version that includes the assertions? I just realised that the current version is still the one with the sink.

@Byron
Copy link
Contributor Author

Byron commented Nov 15, 2023

I could have, but didn't want to keep spamming force-pushes which also trigger notifications, and instead opted for providing a patch which is reproduced here:

diff --git a/radicle-fetch/src/transport/fetch.rs b/radicle-fetch/src/transport/fetch.rs
index 5dfc1535..4558df49 100644
--- a/radicle-fetch/src/transport/fetch.rs
+++ b/radicle-fetch/src/transport/fetch.rs
@@ -269,7 +269,7 @@ where
             );
             // Consume anything that might still be left on the wire - this is 'EOF' most of the time,
             // but some tests have 'garbage' here as well.
-            std::io::copy(&mut reader, &mut std::io::sink())?;
+            assert!(reader.peek_data_line().is_none());
             assert_eq!(
                 reader.stopped_at(),
                 Some(MessageKind::Flush),

If you want me to force-push nonetheless or push a separate commit for this change, I will of course (it's just that the current version is the known good one that I didn't want to meddle with for trying things).

@FintanH
Copy link
Contributor

FintanH commented Nov 15, 2023

The patch works :) I meant just push a separate branch though 😁

@Byron
Copy link
Contributor Author

Byron commented Nov 15, 2023

Alright, I just pushed a new commit then that is for later removal, knowing that I now caused the opposite of what I originally intended: to reduce notifications 😅.

By the way, please do feel free to push or force-push directly into this branch if it helps with testing in any way, no need to go through me. Reviews are much faster and easier that way I find. When checked out with gh pr checkout 556 changes can be pushed back without doing anything special.

@FintanH
Copy link
Contributor

FintanH commented Nov 15, 2023

Thanks for that! So I got some help from colleagues, and you can see the thread here https://radicle.zulipchat.com/#narrow/stream/369277-heartwood/topic/debugging.20gitoxide.20packets

I realised that you might not have seen errors because if radicle-cli fails, for some reason, radicle-node tests will not run. Perhaps you could try run cargo test -p radicle-node with the assertion.

You can see, in the thread, that rudolfs is running MacOS and recent git version runs into the panic by running cargo test -p radicle-node.

Lars also ends up getting the panic on Debian.

@Byron
Copy link
Contributor Author

Byron commented Nov 15, 2023

Thanks for that! So I got some help from colleagues, and you can see the thread here https://radicle.zulipchat.com/#narrow/stream/369277-heartwood/topic/debugging.20gitoxide.20packets

I realised that you might not have seen errors because if radicle-cli fails, for some reason, radicle-node tests will not run. Perhaps you could try run cargo test -p radicle-node with the assertion.

You can see, in the thread, that rudolfs is running MacOS and recent git version runs into the panic by running cargo test -p radicle-node.

I was also running the tests like this to bypass radicle-cli failures initially, but could run everything together using TMPDIR=/tmp cargo nextest run --no-fail-fast --workspace, or individually. Unfortunately, with the patch applied, I never managed to reproduce the issue.

At this point I think one could do all or any of the following:

  • ignore the spurious failures and make what's here mergeable, we know the 'good' version
  • provide a patch to rudolfs so they can share which bytes are still in the channel (which is interesting to me as I'd hope it's 1 and 30 in decimal)

Maybe something else should be done as well - just let me know where we are heading :).

@FintanH
Copy link
Contributor

FintanH commented Nov 16, 2023

At this point I think one could do all or any of the following:

* ignore the spurious failures and make what's here mergeable, we know the 'good' version

* provide a patch to rudolfs so they can share which bytes are still in the channel (which is interesting to me as I'd hope it's 1 and 30 in decimal)

Maybe something else should be done as well - just let me know where we are heading :).

Why not both 😛

I'm happy to take the 'good' version and merge it in :) I'll push the patch via Radicle and can add the link back to here.

Tagging @rudolfs to see if he has time for checking out the leftover bytes. I'm also happy to do it on my system too 👍

@Byron
Copy link
Contributor Author

Byron commented Nov 16, 2023

'All of the following' it is :).

I just have reverted the commit with test-code and have a patch here which does show the remaining bytes left in the channel (I only see an empty string there though).

diff --git a/radicle-fetch/src/transport/fetch.rs b/radicle-fetch/src/transport/fetch.rs
index 5dfc1535..5b45e927 100644
--- a/radicle-fetch/src/transport/fetch.rs
+++ b/radicle-fetch/src/transport/fetch.rs
@@ -1,3 +1,4 @@
+use bstr::ByteSlice;
 use std::{
     borrow::Cow,
     io::{self, BufRead},
@@ -269,7 +270,9 @@ where
             );
             // Consume anything that might still be left on the wire - this is 'EOF' most of the time,
             // but some tests have 'garbage' here as well.
-            std::io::copy(&mut reader, &mut std::io::sink())?;
+            let mut buf = Vec::new();
+            std::io::copy(&mut reader, &mut buf)?;
+            dbg!(buf.as_bstr());
             assert_eq!(
                 reader.stopped_at(),
                 Some(MessageKind::Flush),

I hope that helps to get closer to solving that riddle 😅.

Maybe contributing here isn't the preferred way of doing so, is there something I missed on how to do the same on Radicle? What I am missing in particular is CI as I can't really gauge the quality of my commits while there are local test-failures.

@FintanH
Copy link
Contributor

FintanH commented Nov 16, 2023

So I was trying to use the dbg patch and it was constantly showing up as empty. So I tried to dig in a bit deeper and used the following:

            let mut buf = Vec::new();
            assert!(reader.peek_data_line().is_none(), "buf={:?}", {
                std::io::copy(&mut reader, &mut buf)?;
                buf
            });

Here's the interesting part...

thread 'z6Mkt67GdsW7715MEfRuP4pSZxJRJh6kj6Y48WRqVv4N1tRk <worker#1>    ' panicked at 'buf=[]', /home/haptop/Developer/heartwood/radicle-fetch/src/transport/fetch.rs:274:13

So it seems like peek_data_line can return Some with no data!

@FintanH
Copy link
Contributor

FintanH commented Nov 16, 2023

Maybe contributing here isn't the preferred way of doing so, is there something I missed on how to do the same on Radicle? What I am missing in particular is CI as I can't really gauge the quality of my commits while there are local test-failures.

Aye, we create patches using the rad command-line tool which you can pick up here: https://radicle.xyz/#try

I've created a patch. Unfortunately, we don't have review and conversational elements yet -- that happens on Zulip curently.

@Byron
Copy link
Contributor Author

Byron commented Nov 16, 2023

Thanks for trying the patch, and for the pointer for trying the rad tool! I'd definitely love to try it particularly when conversation is available with it, and will be trying the rad + zulip workflow for the next contribution as well.

So it seems like peek_data_line can return Some with no data!

If the code presented in the comment is the one to draw that conclusion from, then I have a different view on it. peek_data_line() returned None which means it didn't see any data (due to the flush packet). When using std::io::copy() from a source that is EOF, which is what the flush packet does, it comes out empty as well. So all there is to see is that there is nothing, or EOF by flush packet.

This seems to indicate though that having bytes left in there is racy? Maybe @rudolfs has more luck with it, seeing the bytes might help to learn where they are from.

And since there now is the Radicle version of this PR I think I will close this :).

@Byron Byron closed this Nov 16, 2023
@Byron Byron deleted the fix-fetch-issue branch November 16, 2023 11:48
@FintanH
Copy link
Contributor

FintanH commented Nov 16, 2023

peek_data_line() returned None which means it didn't see any data (due to the flush packet).

Perhaps I'm wrong, but the assert! will panic when peek_data_line is Some, since we're saying that we expect it to be None, i.e. is_none. So, peek_data_line is Some and the next argument to assert! will execute, copying whatever bytes are in the reader to the buf. Those copied bytes are empty, judging by the buf=[].

When using std::io::copy() from a source that is EOF, which is what the flush packet does, it comes out empty as well. So all there is to see is that there is nothing, or EOF by flush packet.

Are you saying that the buf=[] indicates that the flush packet is leftover?

@Byron
Copy link
Contributor Author

Byron commented Nov 16, 2023

I enthusiastically exhale that I got this wrong. What I described is what I thought it would be, and what you described is what it actually is.
Yes, it's indeed funny and I think it's a bug which is 'implemented' here: https://github.com/Byron/gitoxide/blob/c3edef1c0c49accbb037bdf086dade3ed0e5e507/gix-packetline/src/read/blocking_io.rs#L152-L154 .

When it encounters EOF during peek, it doesn't return None but the then empty buffer, which explains why it's some. I should be able to add a test for 'peek EOF' to fix it.

Maybe the 'extra bytes' were that bug all along?

@FintanH
Copy link
Contributor

FintanH commented Nov 16, 2023

I enthusiastically exhale that I got this wrong. What I described is what I thought it would be, and what you described is what it actually is.

Hehe, this happens to me all the time too.

I'm thinking it might be that we were unnecessarily panicking due to the is_none check when indeed we were finished reading :)

@Byron
Copy link
Contributor Author

Byron commented Nov 16, 2023

And digging into this, there are tests which want validate that behaviour, too. But running into this here clearly shows it should be improved, if only the documentation should make clear when to expect None.

In the end, if there are no bytes available, there is we learn nothing new :/.

@Byron
Copy link
Contributor Author

Byron commented Nov 16, 2023

Actually I have no explanation for this - it should return None on EOF and it does return it, too, there are tests for that and I even added a specific one to be very sure.

Maybe you could give it one last-ditch attempt and do dbg!(peek_data_line()), as the Some() might contain an error. If not, I could imagine it's actually empty, which can happen if the server sends 0004 I presume, even though this is actively discouraged.

Thanks again!

@FintanH
Copy link
Contributor

FintanH commented Nov 16, 2023

Ok, I ran with dbg!(reader.peek_data_line()) and funnily enough this line panicked in a couple of tests:

assert_eq!(
                reader.stopped_at(),
                Some(MessageKind::Flush),
                "the flush packet was now consumed"
            );

The output of the dbg for those were:

[radicle-fetch/src/transport/fetch.rs:273] reader.peek_data_line() = Some(
    Ok(
        Ok(
            [
                2,
                84,
                111,
                116,
                97,
                108,
                32,
                53,
                32,
                40,
                100,
                101,
                108,
                116,
                97,
                32,
                49,
                41,
                44,
                32,
                114,
                101,
                117,
                115,
                101,
                100,
                32,
                48,
                32,
                40,
                100,
                101,
                108,
                116,
                97,
                32,
                48,
                41,
                44,
                32,
                112,
                97,
                99,
                107,
                45,
                114,
                101,
                117,
                115,
                101,
                100,
                32,
                48,
            ],
        ),
    ),
)

@Byron
Copy link
Contributor Author

Byron commented Nov 16, 2023

Can you imagine an archeologist who keeps digging, thinking it's going to be something big, and then all that's showing is a shard of a coffee mug, or something equally benign 😁?

The playground revealed the decoded message is:

"\u{2}Total 5 (delta 1), reused 0 (delta 0), pack-reused 0"

The 2 means it's a sideband message for stderr, which is typically used for progress messages while receiving/waiting for the pack. These definitely happen out of order and apparently they can also arrive a little late, probably particularly for small packs that are basically just a single packetline. The progress-handler is set for the reader, but it's not triggering while peeking as that would be a consuming action, which is why it's visible in that way. This also explains why it's empty when actually reading from it - it's progress after all, not data.

At least that riddle is solved now, case closed :D.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants