Skip to content

Only link delayed transport AFTER real transport has called transportReady() #1494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2016

Conversation

zhangkun83
Copy link
Contributor

Resolves #1477

If TransportSet fails to connect a transport (i.e., transportShutdown()
called without transportReady()), TransportSet will automatically
schedule reconnection for the next address, unless it has reached the end
of the address list, in which case it will fail the delayed transport.

This will avoid stream errors caused by bad addresses appearing before
good addresses in the resolved address list.

@ejona86 Please review this PR as a whole. I will squash all commits upon check-in.

@@ -346,8 +333,18 @@ public void transportShutdown(Status s) {
new Object[] {transport, address});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay for now to mostly lose the statuses for earlier failed transports. However, it would probably be a good idea to add s to this log output in case someone needs to debug why earlier addresses failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ejona86
Copy link
Member

ejona86 commented Feb 28, 2016

@zhangkun83 LGTM.

Travis on Linux got stuck on :grpc-interop-testing:test for 10 minutes without making progress. I restarted the job and it seems to have gotten hung up on it again. I don't expect it is due to a bug in this PR, but this PR may be triggering it to break. I've been having trouble with timeouts exceeding in interop-testing with some of the PRs I've done. I think it is memory-related.

@zhangkun83
Copy link
Contributor Author

I logged the test methods. It's stuck in InProcessTest.pingPong.

io.grpc.testing.integration.InProcessTest > veryLargeRequest STARTED
io.grpc.testing.integration.InProcessTest > veryLargeRequest PASSED
io.grpc.testing.integration.InProcessTest > deadlineExceeded STARTED
io.grpc.testing.integration.InProcessTest > deadlineExceeded PASSED
io.grpc.testing.integration.InProcessTest > deadlineNotExceeded STARTED
io.grpc.testing.integration.InProcessTest > deadlineNotExceeded PASSED
io.grpc.testing.integration.InProcessTest > exchangeMetadataUnaryCall STARTED
io.grpc.testing.integration.InProcessTest > exchangeMetadataUnaryCall PASSED
io.grpc.testing.integration.InProcessTest > unimplementedMethod STARTED
io.grpc.testing.integration.InProcessTest > unimplementedMethod PASSED
io.grpc.testing.integration.InProcessTest > veryLargeResponse STARTED
io.grpc.testing.integration.InProcessTest > veryLargeResponse PASSED
io.grpc.testing.integration.InProcessTest > emptyUnary STARTED
io.grpc.testing.integration.InProcessTest > emptyUnary PASSED
io.grpc.testing.integration.InProcessTest > exchangeMetadataStreamingCall STARTED
io.grpc.testing.integration.InProcessTest > exchangeMetadataStreamingCall PASSED
io.grpc.testing.integration.InProcessTest > pingPong STARTED
No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.

@zhangkun83
Copy link
Contributor Author

Never mind. The test seems to be stuck in various method each time.
I also tried setting memory to 2G:

    test {
        testLogging {
            exceptionFormat = 'full'
            showExceptions true
            showCauses true
            showStackTraces true
            maxGranularity 3
            minGranularity 3
            events = ["started", "failed", "passed"]
        }
        maxHeapSize = '2000m'
    }

Still stuck.

@zhangkun83
Copy link
Contributor Author

I am sure this is the kind of deadlock that I worried in #1408.
See zhangkun83#1 . The last commit, which moved the meat of transportReady() and transportShutdown() to the executor, fixed the test.

@zhangkun83
Copy link
Contributor Author

This change will make obtainActiveTransport() always return a DelayedClientTransport on the first call (right now it would return the real transport). When DelayedClientTransport used with InProcessTransport, there is a risk of deadlock (#1510). I have to put this on hold until we fix the deadlock.

@zhangkun83 zhangkun83 force-pushed the set_transport_when_ready branch from 349f4cd to 5f95085 Compare March 11, 2016 01:55
@zhangkun83
Copy link
Contributor Author

Rebased onto master so it includes the fix from #1526.
Have run grpc-interop-testing:test for 1000 times on my desktop. Had no failures other than 55 times of:

io.grpc.testing.integration.Http2NettyTest > classMethod FAILED
    java.lang.RuntimeException at Http2NettyTest.java:60
        Caused by: java.io.IOException at Http2NettyTest.java:60
            Caused by: java.net.BindException

which I believe is unrelated to this change.

@ejona86 PTAL

@ejona86
Copy link
Member

ejona86 commented Mar 11, 2016

@zhangkun83, the new commit LGTM. And yeah, BindException seems unrelated.

…Ready().

If TransportSet fails to connect a transport (i.e., transportShutdown()
called without transportReady()), TransportSet will automatically
schedule reconnection for the next address, unless it has reached the end
of the address list, in which case it will fail the delayed transport.

This will reduce stream errors caused by bad addresses appearing before
good addresses in the resolved address list.

Before this change, TransportSet would return the real transport on the
first call of obtainActiveTransport(). After this change, it will return
the delayed transport instead.
@zhangkun83 zhangkun83 force-pushed the set_transport_when_ready branch from 5f95085 to 08c74d4 Compare March 11, 2016 22:48
@zhangkun83 zhangkun83 merged commit 08c74d4 into grpc:master Mar 11, 2016
@zhangkun83 zhangkun83 deleted the set_transport_when_ready branch March 11, 2016 22:52
zhangkun83 added a commit to zhangkun83/grpc-java that referenced this pull request May 29, 2016
Previously TransportSet.shutdown() only shuts down the active transport,
which means a transport will not be shutdown if it's not ready yet. This
issue was introduced by grpc#1494 that postponed the assignment of the
active transport till transport ready.
@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only link delayed transport AFTER real transport has called transportReady()
2 participants