Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op: add recv_multi #250

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Conversation

FrankReh
Copy link
Collaborator

The first operation that supports streaming CQE results.

Adds a Streamable trait, along the lines of the Completable trait.
Comes with stream_next and stream_complete methods.

Adds MultiCQEStream struct, along the lines of the MultiCQEFuture
struct.

Adds a submit_op_stream along the lines of submit_op.

Adds poll_next_op along the lines of poll_op.

The Lifecycle gets two additional methods: take_index, and data.

take_index: returns the Lifecycle index and replaces it with
usize::MAX to show the Lifecycle is no longer represented in the
slab. This feature only used by the new poll_next_op.

data: returns a reference to the Lifecycle's data, to be able to use
it for the Streamable's stream_next calls which only require a
reference. Ownership is still transferred in stream_complete.

Adds the io/recv_multi operation.

The tcp stream recv_multi is a function that
bridges the private types with a public function.

There is no cancel yet for the multishot command but the code can be
written to break out of the loop. Also, when the connection is closed,
the command should fail. It's not tested, but unregistering the buf_ring
might cancel the command too - but maybe not.

Unit tests:

net unit tests and helper functions return io::Return

add recv_multi cases

add BufRingProps to net unit tests - to quiet empty buffer warnings
which are intentional in some tests

Adds new types: bufring::{BufRing, Builder}, bufgroup:{BufX, Bgid, Bid}

    (BufX was chosen as the name of the buffer type of this pool
    simply to make it short and easy to grep for.)

    bufgroup may be better named provbuf, or provided_buffers, down the
    road, especially if we intent to support the other, older, much less
    efficient provided buffer mechanism. But the liburing team
    specifically recommends the buf_ring implementation for ease of use
    and performance reasons so we may not want to incur the overhead of
    supporting both forms.

    bufring could become a trait down the road, where different BufRing
    implementations can be provided. But to avoid the extra decision
    making that comes with designing traits, and to keep the code
    relatively easy to follow, these are all concrate types.

Adds register and unregister functionality to BufRing and to the
tokio_uring driver.

Adds experimental recv, recv_provbuf methods to TcpStream.

    recv_provbuf is a purposefully ugly name. It will be replaced by
    something ... maybe a recv builder sometimme soon.

    Whatever is chosen would then be copied for Udp, Unix and for all
    the other operations that optionally take a provided buffer pool id.

Adds 'net' unit tests. All are admittedly simple ping/pong tests where
only the clients' received lengths are checked, not the actual data.

Adds a tests/common area.

Adds a test case that uses two std::threads, where each thread runs its
own tokio_uring runtime and its own buf_ring provided buffer pool.

    The two-thread case is made long, with many clients, sending many
    large messages to the server and getting them back, in order to see
    gross performance impacts when changing things. It takes 3s on my
    machine. Before going into mainline, the numbers would be changed so
    it took no more than the other unit tests, so about 10ms.

Many TODOs left to cleanup. Primarily Safety rationalizations.

The buffer allocation is made as a single allocation, along with the
ring.

The buffer group id, bgid, also sometimes called the provided buffer
group id, is manually selected through the builder api. There is no
mechanism to pick one automatically. That could be added later but is
not really necessary for this feature to be useful.

This first implementation is without traits and without public
interfaces that would let a user create a different kind of buf_ring or
a different kind of `provided buffers` pool.

There's a question to the liburing team outstanding about how to
interpret an unexpected cqe result of res=0 and flags=4.
The first operation that supports streaming CQE results.

  Adds a Streamable trait, along the lines of the Completable trait.
    Comes with stream_next and stream_complete methods.

  Adds MultiCQEStream struct, along the lines of the MultiCQEFuture
  struct.

  Adds a submit_op_stream along the lines of submit_op.

  Adds poll_next_op along the lines of poll_op.

  The Lifecycle gets two additional methods: take_index, and data.

    take_index: returns the Lifecycle index and replaces it with
    usize::MAX to show the Lifecycle is no longer represented in the
    slab. This feature only used by the new poll_next_op.

    data: returns a reference to the Lifecycle's data, to be able to use
    it for the Streamable's stream_next calls which only require a
    reference. Ownership is still transferred in stream_complete.

Adds the io/recv_multi operation.

The tcp stream recv_multi is a function that
bridges the private types with a public function.

There is no cancel yet for the multishot command but the code can be
written to break out of the loop. Also, when the connection is closed,
the command should fail. It's not tested, but unregistering the buf_ring
might cancel the command too - but maybe not.

Unit tests:

  net unit tests and helper functions return io::Return

  add recv_multi cases

  add BufRingProps to net unit tests - to quiet empty buffer warnings
  which are intentional in some tests
@FrankReh
Copy link
Collaborator Author

This starts as one commit over

So 245 should be reviewed first.


pub(crate) type RecvMultiStream = Op<RecvMulti, MultiCQEStream>;

pub(crate) struct RecvMulti {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about the generalized verions recvmsg_multi too? I'm curious how the API works when receiving a single packet into multiple buffers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the question. You've seen Luca's excellent contribution to io-uring?

Each recvmsg result is going into a single buffer, pulled from the buf_ring pool by the kernel. What am I missing?

Copy link
Contributor

@ollie-etl ollie-etl Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it unconditional have too? recvmsg takes an array of iovec. See PR #252 for the non multi-shot version, and also the existing sendmsg implementation. I read the docs around io_uring_recvmsg_out as an extra on top of the provided iovec specification

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know where you are coming from with this. Can you refer to the liburing documentation to point out what you think I'm missing? The 251 PR doesn't seem relevant to how kernel io_uring implements multishot recvmsg at all?

Copy link
Contributor

@ollie-etl ollie-etl Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know where you are coming from with this. Can you refer to the liburing documentation to point out what you think I'm missing? The 251 PR doesn't seem relevant to how kernel io_uring implements multishot recvmsg at all?

My apologies: #252 is relevant. 251 is completely irrelevant

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question was, have you considered how to cope with streaming operations where multiple buffers are provided in the original msghdr: that implies a collections of buffers

I have considered that collections of buffers are not part of the io_uring multishot receiving APIs so we don't need a streaming API for such an animal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have considered that collections of buffers are not part of the io_uring multishot receiving APIs so we don't need a streaming API for such an animal.

Thats a shame: that's a difference in API vs the non multishot API and the linux syscall, and precludes some of the memory locality use cases i spelt out in #252

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are differences when multishot comes into play - the one you are focusing on here isn't the only one and your wishing the multishot for recvmsg worked differently while this PR is getting multishot for recv support in the crate first. The io_uring team provides extra APIs to suite certain specialized use cases for performance reasons - they don't claim a broad and general API with consistent look and feel across the board.

And we don't need to either. We're trying to make the io_uring APIs accessible for folks who want to use the tokio runtime while providing a sound crate. We can't force every of their APIs into a single representation and there's no need to.

I supported your introduction of the ideas in 252, just earlier today, but I don't understand why you lean on it so heavily in a review of this PR for multishot recv. If something needs to be a shame, please spell it out in 252.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't wish to detract from this pr: I like this a great deal, especially the provided ring buffers required to make it work.

I'm looking at this API, as with most, with my use case in mind, which is UDP packet processing on a woefully underpowered aarch64 platform. Outperforming recv_mmsg is what brought me to this crate in the first place.

Personally I'm not invested in tokio api compliance at all - I simply don't care if we completely diverge. The API I always compare to are the base syscalls

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if it outperforms what you are comparing it to, then it will do so without multishot. Or if you are invested in the io_uring interface, maybe have this discussion at liburing.

@FrankReh FrankReh mentioned this pull request Feb 28, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants