You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reproduced here in case future diffs or GitHub navigation breaks that link,
let lookup_err = resolver
.soa_lookup(TEST_ZONE).await.expect_err("test zone should not exist");// I think we really should answer with ResponseCode::Refused. We are not// authoritative for the .internal TLD, so we don't know that some *other*// server would have records for `oxide.internal`. It is not a failure of// our server to not know what that domain is, we should just refuse to// answer.//// One may imagine we should return at least NXDomain without the// authoritative bit set. RFC 1035 says that "Name Error - Meaningful only// from an authoritative name server, ...". Does that mean that recursive// resolvers and clients would faithfully ignore our error in that case? Is// there a risk that something would miss the non-authoritative nature of// such an NXDomain and incorrectly cache the non-existence of some other// domain? Hopefully not! Answering `Refused` would side-step this question.expect_no_records_error_code(&lookup_err,ResponseCode::ServFail);
implFrom<QueryError>forRequestError{fnfrom(source:QueryError) -> Self{match&source {QueryError::NoName(_) => RequestError::NxDomain(source),// Bail with servfail when this query is for a zone that we don't// own (and other server-side failures) so that resolvers will look// to other DNS servers for this query.QueryError::NoZone(_)
| QueryError::QueryFail(_)
| QueryError::ParseFail(_) => RequestError::ServFail(source.into()),}}}
from reading RFC 1035 and related (though i'm not certain that no later RFC refines it), "Server failure" means that there was an issue with the name server in processing a request. we obviously should not return "Name Error" (aka NXDOMAIN), as we don't know if a zone we're not authoritative for does not exist. this suggests to me that a more accurate error would be to return Refused, since we can neither say there are nor are not records for the queried domain. as it turns out, this also the response i get from dig a foo. $nameserver against a dozen nameservers i tried.
realistically i think this is unlikely to matter. if we're getting queries for zones we don't know about, something else is pretty horked. but i could imagine some intermediary seeing SERVFAIL as an implication that the server(s) are unhealthy for other queries that we're perfectly happy to answer. (and yes, i do wonder if we should better justify SERVFAIL for QueryFail and ParseFail errors. answers for queries like dig a $(printf "hello\x7fworld.$zone") ns1.$zone get a mix of NXDOMAIN and REFUSED, for comparison)
The text was updated successfully, but these errors were encountered:
@rmustacc notes that the particular error code may be important to some clients that have disparate DNS servers - one example here might be NTP on the rack, using the system libresolv2, where internal DNS may be consulted along with whatever upstream resolver the rack also queries. SERVFAIL here is one way we'd ensured that an error is not taken fatal to the query and instead resulted in the client trying other DNS servers that may be able to actually provide an answer.
so if this behavior is changed, we'll need to be particularly careful that changing Omicron's DNS servers doesn't result in issues for DNS clients outside Omicron (both in the rack and obviously on customer networks).
i'm a bit optimistic here - in some cases SERVFAIL is considered in the same way as REFUSED. but in other cases SERVFAIL seems less fatal than REFUSED. this deserves some more careful reading and testing.
if our DNS servers get a query for an unknown zone, we return SERVFAIL. while working on #8047 i was writing a test that brushed up against this behavior, and i left some notes: https://github.com/oxidecomputer/omicron/pull/8047/files#diff-c0d9697cfa638ea3c770968ab4bd7fd28692d676e50055101339740a1688bd4aR415-R432
reproduced here in case future diffs or GitHub navigation breaks that link,
the behavior now is pretty simple and intentional, this is from
dns-server/src/dns_server.rs
:from reading RFC 1035 and related (though i'm not certain that no later RFC refines it), "Server failure" means that there was an issue with the name server in processing a request. we obviously should not return "Name Error" (aka NXDOMAIN), as we don't know if a zone we're not authoritative for does not exist. this suggests to me that a more accurate error would be to return
Refused
, since we can neither say there are nor are not records for the queried domain. as it turns out, this also the response i get fromdig a foo. $nameserver
against a dozen nameservers i tried.realistically i think this is unlikely to matter. if we're getting queries for zones we don't know about, something else is pretty horked. but i could imagine some intermediary seeing
SERVFAIL
as an implication that the server(s) are unhealthy for other queries that we're perfectly happy to answer. (and yes, i do wonder if we should better justifySERVFAIL
forQueryFail
andParseFail
errors. answers for queries likedig a $(printf "hello\x7fworld.$zone") ns1.$zone
get a mix of NXDOMAIN and REFUSED, for comparison)The text was updated successfully, but these errors were encountered: