Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subsetting java impl #1

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

subsetting java impl #1

wants to merge 25 commits into from

Conversation

joybestourous
Copy link
Owner

No description provided.

Copy link

@s-matyukevich s-matyukevich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I have a lot of minor comments.


if (addresses.size() <= config.subsetSize) return allAddresses;
if (config.sortAddresses) {
// If we sort, we do so via destination hashcode. This is deterministic but differs from the goland instrumentation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*golang

I guess we should align on this, not sure how realistic this is, but in theory people may want to combine go a java clients in a single ring.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should align. Do you know how golang does the comparison? How does it determine one address is "less than" or "greater than" another?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this myself. It looks like Address.Addr is a string, so golang is using simple string comparison. SocketAddress does not have an equivalent, so we'd have to call .toString() which isn't guaranteed to produce the same result. I think if we want to align, we should adjust on the golang side.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we'd have to call .toString() which isn't guaranteed to produce the same result

Why do you think so? String representation of an IP should be the same in all languages, but hashes could be different.

Copy link
Owner Author

@joybestourous joybestourous Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if toString() is not overridden java will print out the class description. For example, the FakeSocketAddress we use here overrides toString() to be the name of the socket address, so we see [[FakeSocketAddress-server0]/{}]. Without that we get [[io.grpc.util.OutlierDetectionLoadBalancerTest$FakeSocketAddress@5ed4bc]/{}]. SocketAddress is abstract and does not have any valuable information for toString to be meaningful, and we can't guarantee all of it's extending classes override toString. Maybe that's still safer than hashcode, but there's no guarantee.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but can we implement a custom Comparator, which will use string representation of an IP, and use it for sorting? IMO it is much better to add to gRFC that addresses must be sorted by IP rather than by some type of hash and define what hashing function should be used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I got the problem now, so what you are saying is that there is no way to safely get a string representation of an instance of SocketAddress in java? I guess that outside tests it will always be an instance of InetSocketAddress, which should correctly override toString() and the way FakeSocketAddress overrides it is also fine for tests. Can we receive an instance of any other class from the resolver?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's the problem. InetSocketAddress should be the most common, yeah, but the expectations will break if folks use a custom impl of SocketAddress. In general i'm wary of making assumptions like these because we are accepting any SocketAddress, not only InetSocketAddresses, but I suppose InetSocketAddress is well enough established at this point that it's acceptable to do this and just make sure the assumption is well documented.


ArrayList<EquivalentAddressGroup> eaglist = new ArrayList<>();

for (EquivalentAddressGroup eag : allAddresses.getAddresses()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases EquivalentAddressGroup can contain more then 1 address? I think if this happens it will break the current implementation. Maybe we should check for this case and return an error? Or we probably can construct eaglist directly from subset.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think this would break the current implementation? There is no guarantee that it will contain only one address (if there is, we should convince them to remove the EAG because it honestly is just overcomplicating their entire implementation)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked more into it and I think we should treat every EquivalentAddressGroup as a single address.

A group of SocketAddresses that are considered equivalent when channel makes connections.
Usually the addresses are addresses resolved from the same host name, and connecting to any of them is equally sufficient. They do have order. An address appears earlier on the list is likely to be tried earlier.
  • Looks like RoundRobin LB creates a sub-channel per EAG code
  • DNS resolver creates a EAG per resolved address code

I didn't figure out the use-case when EAG is useful, I think it is used in cases when a resolver can provide multiple addresses to connect to the same host, but I don't understand why this could be useful. Still, looks like the rest of the code treats EAG as a single address and probably should do the same.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm okay, I'll make this change because I agree it is more in line with what they have, but I will also file an issue/question with upstream regarding its purpose. To me it honestly looks like it exists solely because it would be annoying to remove.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah looks like there is an intended purpose: grpc#5263 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, this make sense. I think it confirms that we should treat EAG as a single address.

joybestourous pushed a commit that referenced this pull request Apr 3, 2024
6efa9ee added `volatile` to `attributes` after TSAN detected a data
race that was added in 91d15ce. The race was because attributes may be
read from another thread after `transportReady()`, and the
post-filtering assignment occurred after `transportReady()`. The code
now filters the attributes separately so they are updated before calling
`transportReady()`.

Original TSAN failure:
```
  Read of size 4 at 0x0000cd44769c by thread T23:
    #0 io.grpc.netty.NettyClientHandler.getAttributes()Lio/grpc/Attributes; NettyClientHandler.java:327
    #1 io.grpc.netty.NettyClientTransport.getAttributes()Lio/grpc/Attributes; NettyClientTransport.java:363
    #2 io.grpc.netty.NettyClientTransport.newStream(Lio/grpc/MethodDescriptor;Lio/grpc/Metadata;Lio/grpc/CallOptions;[Lio/grpc/ClientStreamTracer;)Lio/grpc/internal/ClientStream; NettyClientTransport.java:183
    grpc#3 io.grpc.internal.MetadataApplierImpl.apply(Lio/grpc/Metadata;)V MetadataApplierImpl.java:74
    grpc#4 io.grpc.auth.GoogleAuthLibraryCallCredentials$1.onSuccess(Ljava/util/Map;)V GoogleAuthLibraryCallCredentials.java:141
    grpc#5 com.google.auth.oauth2.OAuth2Credentials$FutureCallbackToMetadataCallbackAdapter.onSuccess(Lcom/google/auth/oauth2/OAuth2Credentials$OAuthValue;)V OAuth2Credentials.java:534
    grpc#6 com.google.auth.oauth2.OAuth2Credentials$FutureCallbackToMetadataCallbackAdapter.onSuccess(Ljava/lang/Object;)V OAuth2Credentials.java:525
    ...

  Previous write of size 4 at 0x0000cd44769c by thread T24:
    #0 io.grpc.netty.NettyClientHandler$FrameListener.onSettingsRead(Lio/netty/channel/ChannelHandlerContext;Lio/netty/handler/codec/http2/Http2Settings;)V NettyClientHandler.java:920
    #1 io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onSettingsRead(Lio/netty/channel/ChannelHandlerContext;Lio/netty/handler/codec/http2/Http2Settings;)V DefaultHttp2ConnectionDecoder.java:515
    ...
```
joybestourous pushed a commit that referenced this pull request Apr 3, 2024
As discovered by TSAN, the adsStream field is not synchronized.
```
WARNING: ThreadSanitizer: data race (pid=1625)
  Read of size 4 at 0x00009b66fc88 by thread T23 (mutexes: write M0):
    #0 io.grpc.xds.ControlPlaneClient.isReady()Z ControlPlaneClient.java:203
    #1 io.grpc.xds.ControlPlaneClient.readyHandler()V ControlPlaneClient.java:211
    #2 io.grpc.xds.ControlPlaneClient$AdsStream.onReady()V ControlPlaneClient.java:328
    grpc#3 io.grpc.xds.GrpcXdsTransportFactory$EventHandlerToCallListenerAdapter.onReady()V GrpcXdsTransportFactory.java:145
    grpc#4 io.grpc.PartialForwardingClientCallListener.onReady()V PartialForwardingClientCallListener.java:44
    grpc#5 io.grpc.ForwardingClientCallListener.onReady()V ForwardingClientCallListener.java:23
    grpc#6 io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onReady()V ForwardingClientCallListener.java:40
    grpc#7 io.grpc.PartialForwardingClientCallListener.onReady()V PartialForwardingClientCallListener.java:44
    grpc#8 io.grpc.ForwardingClientCallListener.onReady()V ForwardingClientCallListener.java:23
    grpc#9 io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onReady()V ForwardingClientCallListener.java:40
    grpc#10 io.grpc.internal.DelayedClientCall$DelayedListener.onReady()V DelayedClientCall.java:497
    grpc#11 io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamOnReady.runInternal()V ClientCallImpl.java:781
    grpc#12 io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamOnReady.runInContext()V ClientCallImpl.java:772
    grpc#13 io.grpc.internal.ContextRunnable.run()V ContextRunnable.java:37
    grpc#14 io.grpc.internal.SerializingExecutor.run()V SerializingExecutor.java:133
    grpc#15 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V ThreadPoolExecutor.java:1130
    grpc#16 java.util.concurrent.ThreadPoolExecutor$Worker.run()V ThreadPoolExecutor.java:630
    grpc#17 java.lang.Thread.run()V Thread.java:830
    grpc#18 (Generated Stub) <null>

  Previous write of size 4 at 0x00009b66fc88 by thread T4 (mutexes: write M1, write M2, write M3, write M4, write M5):
    #0 io.grpc.xds.ControlPlaneClient$AdsStream.cleanUp()V ControlPlaneClient.java:424
    #1 io.grpc.xds.ControlPlaneClient$AdsStream.close(Ljava/lang/Exception;)V ControlPlaneClient.java:418
    #2 io.grpc.xds.ControlPlaneClient$1.run()V ControlPlaneClient.java:130
    grpc#3 io.grpc.SynchronizationContext.drain()V SynchronizationContext.java:94
    grpc#4 io.grpc.SynchronizationContext.execute(Ljava/lang/Runnable;)V SynchronizationContext.java:126
    grpc#5 io.grpc.xds.XdsClientImpl.shutdown()V XdsClientImpl.java:207
    grpc#6 io.grpc.xds.SharedXdsClientPoolProvider$RefCountedXdsClientObjectPool.returnObject(Ljava/lang/Object;)Lio/grpc/xds/XdsClient; SharedXdsClientPoolProvider.java:144
    grpc#7 io.grpc.xds.SharedXdsClientPoolProvider$RefCountedXdsClientObjectPool.returnObject(Ljava/lang/Object;)Ljava/lang/Object; SharedXdsClientPoolProvider.java:102
    grpc#8 io.grpc.xds.XdsClientFederationTest.cleanUp()V XdsClientFederationTest.java:86
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants