-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(grpc): Add tonic transport #2339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
1417e9d
to
66e6c10
Compare
Hi @dfawley and @LucioFranco, could you please review this PR when you have a moment? |
fe1436e
to
957377c
Compare
@@ -229,6 +235,7 @@ impl InternalSubchannel { | |||
transport: Arc<dyn Transport>, | |||
backoff: Arc<dyn Backoff>, | |||
unregister_fn: Box<dyn FnOnce(SubchannelKey) + Send + Sync>, | |||
runtime: Arc<dyn Runtime>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious:
Given that we will have the same runtime for all the different gRPC components that require a runtime, did we consider something like a singleton that is initialized at init time, and all the components can use a getter to retrieve and use the singleton instead of the runtime being passed to every component that needs it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different grpc channels could theoretically use different runtimes. Maybe that isn't something we need to support, but it's pretty easily attained - it just requires passing around the runtime a bit more than if it were global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ passes the event engine through channel args. In my opinion passing the runtime through a function param allows for cleaner dependency injection. It also enforces that the runtime is set during channel creation, before RPCs are made.
Having a singleton runtime will force all gRPC channels in a binary to use the same runtime. I don't know if this is a con though. We can discuss this in the team meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, thoug all of these Arc<dyn ...>
should have new type wrappers to clean this up. I think passing a runtime handle around is totally fine as long as its cheap to clone. We likely do not want users to have to shuffle a runtime around though.
@@ -345,30 +353,34 @@ impl InternalSubchannel { | |||
let transport = self.transport.clone(); | |||
let address = self.address().address; | |||
let state_machine_tx = self.state_machine_event_sender.clone(); | |||
let connect_task = tokio::task::spawn(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have some kind of vet
equivalent to ensure that task spawning (and other features provided by the runtime) are always only used from the runtime and not from other places (like tokio or the standard library)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately (pre-1.0) we want to not have any tokio runtime crates/features listed in Cargo.toml, except if you are using a tokio feature flag. That would prevent such a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time looking into this. I found two approaches:
- Use clippy disallowed_method, disallowed_macros, etc. to block tokio symbols like
tokio::spawn
,tokio::task::spawn
, etc. The problem with this approach is that we need to list all the types we want to block, there's not glob (*
) operator available. It's also easy to miss the clippy warnings since they don't block PR submission. - Introduce a separate crate, say
grpc-runtime-tokio
, for the default runtime implementation, and disable tokio's runtime features in the main grpc crate. If a function in thegrpc
crate tries to calltokio::spawn
, it will fail to compile as the required feature will be disabled. The concern with this approach is that we need to export the runtime trait (and related types) which are unstable.
@LucioFranco would like to get your thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cargo features can be enabled even if a (transitive) dependency enabled the feature. I wasn't seein any compilation failures even after removing the tokio:rt feature from the Cargo.toml
. I tracked down the depdency to tower's buffer feature:
cargo tree -i tokio -e features --edges=normal -p grpc --no-default-features
tokio v1.46.1
├── tokio feature "bytes"
│ └── tokio feature "io-util"
│ └── h2 v0.4.11
│ └── h2 feature "default"
│ └── hyper v1.6.0
│ ├── hyper feature "client"
│ │ └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ ├── hyper feature "default"
│ │ └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ └── hyper feature "http2"
│ └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
├── tokio feature "default"
│ ├── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ ├── h2 v0.4.11 (*)
│ ├── hyper v1.6.0 (*)
│ ├── tokio-stream v0.1.17
│ │ └── tonic v0.14.0 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/tonic)
│ │ └── tonic feature "codegen"
│ │ └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ │ ├── tokio-stream feature "default"
│ │ │ └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ │ └── tokio-stream feature "time"
│ │ └── tokio-stream feature "default" (*)
│ ├── tokio-util v0.7.15
│ │ └── tower v0.5.2
│ │ ├── tower feature "__common"
│ │ │ ├── tower feature "buffer"
│ │ │ │ └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ │ │ ├── tower feature "limit"
│ │ │ │ └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ │ │ └── tower feature "util"
│ │ │ └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ │ ├── tower feature "buffer" (*)
│ │ ├── tower feature "default"
│ │ │ └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ │ ├── tower feature "futures-core"
│ │ │ └── tower feature "__common" (*)
│ │ ├── tower feature "futures-util"
│ │ │ └── tower feature "util" (*)
│ │ ├── tower feature "limit" (*)
│ │ ├── tower feature "pin-project-lite"
│ │ │ ├── tower feature "__common" (*)
│ │ │ └── tower feature "util" (*)
│ │ ├── tower feature "sync_wrapper"
│ │ │ └── tower feature "util" (*)
│ │ ├── tower feature "tokio"
│ │ │ ├── tower feature "buffer" (*)
│ │ │ └── tower feature "limit" (*)
│ │ ├── tower feature "tokio-util"
│ │ │ ├── tower feature "buffer" (*)
│ │ │ └── tower feature "limit" (*)
│ │ ├── tower feature "tracing"
│ │ │ ├── tower feature "buffer" (*)
│ │ │ └── tower feature "limit" (*)
│ │ └── tower feature "util" (*)
│ │ ├── tokio-util feature "codec"
│ │ │ └── h2 v0.4.11 (*)
│ │ ├── tokio-util feature "default"
│ │ │ └── h2 v0.4.11 (*)
│ │ └── tokio-util feature "io"
│ │ └── h2 v0.4.11 (*)
│ └── tower v0.5.2 (*)
├── tokio feature "io-util" (*)
├── tokio feature "rt"
│ └── tower feature "buffer" (*)
├── tokio feature "sync"
│ ├── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│ ├── hyper v1.6.0 (*)
│ ├── tokio-stream v0.1.17 (*)
│ ├── tokio-util v0.7.15 (*)
│ └── tower v0.5.2 (*)
│ ├── tower feature "buffer" (*)
│ └── tower feature "limit" (*)
└── tokio feature "time"
└── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
├── tokio-stream feature "time" (*)
└── tower feature "limit" (*)
Buffer has a constructor that uses tokio as the default executor. We're not using this constructor though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a default private feature for the tokio runtime that enables the tokio/rt
feature flag. Due to this, tokio::spawn
should not be usable outside the grpc::rt::tokio
module. If tokio::spawn
is used outside this module, the build will fail with default feature flags disabled, failing CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this yesterday, for now this is fine, we can rely on tokio initially until we make some more overall progress.
self.m | ||
.lock() | ||
.unwrap() | ||
.insert(address_type.to_string(), Arc::new(transport)); | ||
} | ||
|
||
/// Retrieve a name resolver from the registry, or None if not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this comments needs updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the comment.
@@ -26,20 +25,20 @@ impl std::fmt::Debug for TransportRegistry { | |||
|
|||
impl TransportRegistry { | |||
/// Construct an empty name resolver registry. | |||
pub fn new() -> Self { | |||
pub(crate) fn new() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also implement Default
trait for this type by inheriting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an explicit Default
implementation below. I removed it and used the derive
macro instead.
@@ -388,7 +400,9 @@ impl InternalSubchannel { | |||
// error string containing information about why the connection | |||
// terminated? But what can we do with that error other than logging | |||
// it, which the transport can do as well? | |||
svc.disconnected().await; | |||
if let Err(e) = closed_rx.await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the above task spawn be on the runtime as well instead of directly using tokio
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, used the runtime here.
grpc/Cargo.toml
Outdated
futures = "0.3.31" | ||
tower = { version = "0.5.2", features = ["buffer", "limit", "util"] } | ||
tower-service = "0.3.3" | ||
socket2 = "0.5.10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep these sorted so the diffs are easier to read? E.g. socket2
and tower-service
are already present at the same number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorted them using cargo-sort and formatted it using taplo.
@@ -229,6 +235,7 @@ impl InternalSubchannel { | |||
transport: Arc<dyn Transport>, | |||
backoff: Arc<dyn Backoff>, | |||
unregister_fn: Box<dyn FnOnce(SubchannelKey) + Send + Sync>, | |||
runtime: Arc<dyn Runtime>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different grpc channels could theoretically use different runtimes. Maybe that isn't something we need to support, but it's pretty easily attained - it just requires passing around the runtime a bit more than if it were global.
@@ -345,30 +353,34 @@ impl InternalSubchannel { | |||
let transport = self.transport.clone(); | |||
let address = self.address().address; | |||
let state_machine_tx = self.state_machine_event_sender.clone(); | |||
let connect_task = tokio::task::spawn(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately (pre-1.0) we want to not have any tokio runtime crates/features listed in Cargo.toml, except if you are using a tokio feature flag. That would prevent such a thing.
// TODO: The following options are specific to HTTP/2. We should | ||
// instead pass an `Attribute` like struct to the connect method instead which | ||
// can hold config relevant to a particular transport. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think it's good to eventually have a "common http2 configuration" struct that multiple h2 transports can share for configuration knobs common across any/most h2-based transport implementations. We can also have transport-specific configuration for tonic. And all of those should be in a generic transport configuration, attribute-like struct.
grpc/src/client/transport/mod.rs
Outdated
pub rate_limit: Option<(u64, Duration)>, | ||
pub tcp_keepalive: Option<Duration>, | ||
pub tcp_nodelay: bool, | ||
pub connect_timeout: Option<Duration>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deadline (Instant
) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to Instant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I dropped this in here as a draft and forgot to revisit before sending. I'm not sure which one is better, or if it doesn't even really matter. I was thinking about discussing it first. Anyway I think it's not too important, so we can leave it either way for now unless you want to spend time on it.
grpc/src/service.rs
Outdated
|
||
#[async_trait] | ||
pub trait Service: Send + Sync { | ||
async fn call(&self, method: String, request: Request) -> Response; | ||
} | ||
|
||
// TODO: define methods that will allow serialization/deserialization. | ||
pub trait Message: Any + Send + Sync {} | ||
pub trait Message: Any + Send + Sync { | ||
fn as_any(self: Box<Self>) -> Box<dyn Any>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these can be removed again, right? Because we're using trait upcasting elsewhere now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed.
Co-authored-by: Doug Fawley <[email protected]>
41977ed
to
38901f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just some rust recommendations. Obviously I couldn't compile them but I'm pretty sure they're reasonable suggestions. If they don't work, just let me know.
// Using tower/buffer enables tokio's rt feature even though it's possible to | ||
// create Buffers with a user provided executor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something we'll eventually need to solve, right?
async fn call(&self, method: String, request: GrpcRequest) -> GrpcResponse { | ||
let mut grpc = self.grpc.clone(); | ||
if let Err(e) = grpc.ready().await { | ||
let err = Status::unknown(format!("Service was not ready: {}", e)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text feels slightly wrong.. In what ways can this fail? If the connection is lost then we'll use Status::unavailable
and if it times out waiting to accept the request, then Status::deadline_exceeded
. Ideally ready()
would return something appropriate already that we don't have to wrap, compute, or change, since it's a grpc component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from the code generated by Tonic, for example:
tonic/tonic-health/src/generated/grpc_health_v1.rs
Lines 147 to 154 in a738cab
self.inner | |
.ready() | |
.await | |
.map_err(|e| { | |
tonic::Status::unknown( | |
format!("Service was not ready: {}", e.into()), | |
) | |
})?; |
This returns an error when the underlying connection has failed, but I'm not sure if there are other cases where the ready()
call might fail. I think returning Unavailable
makes more sense here.
@LucioFranco — could you confirm whether it's safe to use Unavailable
in this context?
According to the gRPC status code definition for Unavailable
:
The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most likely thing that would fail would be unable to acquire a permit slot for a rate limiter. I would not worry so much about this its likely to go away and 99% of impls return ready. I would maybe return RESOURCE_EXHAUSTED
?
let err = Status::unknown(format!("Service was not ready: {}", e)); | ||
return create_error_response(err); | ||
}; | ||
let path = if let Ok(p) = PathAndQuery::from_maybe_shared(method) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably happen near the top? Above waiting for the transport at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the top.
let response = match res { | ||
Ok(s) => s, | ||
Err(e) => { | ||
let stream = futures::stream::once(async { Err(e) }); | ||
return TonicResponse::new(Box::pin(stream)); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let response = match res { | |
Ok(s) => s, | |
Err(e) => { | |
let stream = futures::stream::once(async { Err(e) }); | |
return TonicResponse::new(Box::pin(stream)); | |
} | |
}; | |
let Ok(response) = res else { | |
let stream = futures::stream::once(async { res }); | |
return TonicResponse::new(Box::pin(stream)); | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This simplification doesn't quite work because the type of res
in the else
branch is still a Result
. While we can call .unwrap_err()
to extract the Error
, I was trying to avoid using unwrap
-style methods to ensure correctness is enforced at compile time.
let mut new_req = TonicRequest::new(bytes_stream as _); | ||
*new_req.metadata_mut() = metadata; | ||
*new_req.extensions_mut() = extensions; | ||
new_req |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, changed.
09c8dc2
to
bfddcba
Compare
bfddcba
to
7388013
Compare
grpc/src/client/transport/mod.rs
Outdated
pub init_stream_window_size: Option<u32>, | ||
pub init_connection_window_size: Option<u32>, | ||
pub http2_keep_alive_interval: Option<Duration>, | ||
pub http2_keep_alive_timeout: Option<Duration>, | ||
pub http2_keep_alive_while_idle: Option<bool>, | ||
pub http2_max_header_list_size: Option<u32>, | ||
pub http2_adaptive_window: Option<bool>, | ||
pub concurrency_limit: Option<usize>, | ||
pub rate_limit: Option<(u64, Duration)>, | ||
pub tcp_keepalive: Option<Duration>, | ||
pub tcp_nodelay: bool, | ||
pub connect_timeout: Option<Duration>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These pub's should also be pub(crate)
since the fields visibility can't be greater than the parent. The compiler should warn about this I believe so maybe the warning is disabled?
/// the address type they are intended to handle. | ||
#[derive(Clone)] | ||
pub struct TransportRegistry { | ||
m: Arc<Mutex<HashMap<String, Arc<dyn Transport>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use inner
rather than one letter field names.
use futures::stream::StreamExt; | ||
use futures::Stream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these need to pull from tokio_stream
and we should remove futures as much as possible.
use hyper::client::conn::http2::Builder; | ||
use hyper::client::conn::http2::SendRequest; | ||
use std::{ | ||
error::Error, | ||
future::Future, | ||
net::SocketAddr, | ||
pin::Pin, | ||
str::FromStr, | ||
sync::Arc, | ||
task::{Context, Poll}, | ||
}; | ||
use tonic::Request as TonicRequest; | ||
use tonic::Response as TonicResponse; | ||
use tonic::Streaming; | ||
use tower::{ | ||
buffer::{future::ResponseFuture as BufferResponseFuture, Buffer}, | ||
limit::{ConcurrencyLimitLayer, RateLimitLayer}, | ||
util::BoxService, | ||
ServiceBuilder, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in @dfawley's PR but we should be consistent on our import style. I say we just follow the pattern in tonic.
async fn call(&self, method: String, request: GrpcRequest) -> GrpcResponse { | ||
let mut grpc = self.grpc.clone(); | ||
if let Err(e) = grpc.ready().await { | ||
let err = Status::unknown(format!("Service was not ready: {}", e)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most likely thing that would fail would be unable to acquire a permit slot for a rate limiter. I would not worry so much about this its likely to go away and 99% of impls return ready. I would maybe return RESOURCE_EXHAUSTED
?
@@ -229,6 +235,7 @@ impl InternalSubchannel { | |||
transport: Arc<dyn Transport>, | |||
backoff: Arc<dyn Backoff>, | |||
unregister_fn: Box<dyn FnOnce(SubchannelKey) + Send + Sync>, | |||
runtime: Arc<dyn Runtime>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, thoug all of these Arc<dyn ...>
should have new type wrappers to clean this up. I think passing a runtime handle around is totally fine as long as its cheap to clone. We likely do not want users to have to shuffle a runtime around though.
@@ -345,30 +353,34 @@ impl InternalSubchannel { | |||
let transport = self.transport.clone(); | |||
let address = self.address().address; | |||
let state_machine_tx = self.state_machine_event_sender.clone(); | |||
let connect_task = tokio::task::spawn(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this yesterday, for now this is fine, we can rely on tokio initially until we make some more overall progress.
ef54eb2
to
77557b9
Compare
77557b9
to
e6afa5f
Compare
This PR includes the following:
tonic/src/transport
, required code is copied over.Bytes
. This is a temporary workaround until tonic supports bypassing the codec and receiving bytes.examples/tower
since was causing udeps failures due to cargo's feature resolution in workspaces.