-
Notifications
You must be signed in to change notification settings - Fork 404
Description
This issue has been migrated from #9774.
Originally discovered on #synapse:matrix.org by @LTangaF
On Joel's server, doing the following DNS query times out;
root@5d0681f56cda:/# dig _matrix._tcp.matrix.lion.fm SRV
; <<>> DiG 9.11.5-P4-5.1+deb10u3-Debian <<>> _matrix._tcp.matrix.lion.fm SRV
;; global options: +cmd
;; connection timed out; no servers could be reached
While a valid SRV record doesn't time out;
root@5d0681f56cda:/# dig _matrix._tcp.jboi.nl SRV
; <<>> DiG 9.11.5-P4-5.1+deb10u3-Debian <<>> _matrix._tcp.jboi.nl SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 560
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_matrix._tcp.jboi.nl. IN SRV
;; ANSWER SECTION:
_matrix._tcp.jboi.nl. 120 IN SRV 0 0 443 matrix.jboi.nl.
;; Query time: 40 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Apr 08 20:50:06 UTC 2021
;; MSG SIZE rcvd: 83
This is already odd, but synapse currently doesn't specify a timeout when looking up SRV records.
The offending snippet is this:
When the underlying DNS query times out, this does never complete, and it causes a federation transmission loop to "time out" the whole request, putting it on catchup.
twisted has the following interface for lookupService:
def lookupService(name: str, timeout: Sequence[int]) -> "Deferred":
"""
Perform an SRV record lookup.
@param name: DNS name to resolve.
@param timeout: Number of seconds after which to reissue the query.
When the last timeout expires, the query is considered failed.
@return: A L{Deferred} which fires with a three-tuple of lists of
L{twisted.names.dns.RRHeader} instances. The first element of the
tuple gives answers. The second element of the tuple gives
authorities. The third element of the tuple gives additional
information. The L{Deferred} may instead fail with one of the
exceptions defined in L{twisted.names.error} or with
C{NotImplementedError}.
"""The optional parameter timeout defines that timeout, however, synapse isn't giving it any, so it never times out. Or synapse doesn't give it a strict enough timeout.
I propose adding a 15 second timeout by adding timeout=(15,) to the SrvResolver.resolve_service snippet.
Edit: The default resolver defines the timeouts of (1, 3, 11, 45), however, it adds these up with eachother, so it basically tries to resolve dns for exactly 60 seconds before giving up, and then it has a "timeout race" with the previously-established HTTP agent timeout (also of 60 seconds), which causes this DNS query to never promptly "time out" before it's overlaying "HTTP request timeout" could.