Description
if our DNS servers get a query for an unknown zone, we return SERVFAIL. while working on #8047 i was writing a test that brushed up against this behavior, and i left some notes: https://github.com/oxidecomputer/omicron/pull/8047/files#diff-c0d9697cfa638ea3c770968ab4bd7fd28692d676e50055101339740a1688bd4aR415-R432
reproduced here in case future diffs or GitHub navigation breaks that link,
let lookup_err = resolver
.soa_lookup(TEST_ZONE)
.await
.expect_err("test zone should not exist");
// I think we really should answer with ResponseCode::Refused. We are not
// authoritative for the .internal TLD, so we don't know that some *other*
// server would have records for `oxide.internal`. It is not a failure of
// our server to not know what that domain is, we should just refuse to
// answer.
//
// One may imagine we should return at least NXDomain without the
// authoritative bit set. RFC 1035 says that "Name Error - Meaningful only
// from an authoritative name server, ...". Does that mean that recursive
// resolvers and clients would faithfully ignore our error in that case? Is
// there a risk that something would miss the non-authoritative nature of
// such an NXDomain and incorrectly cache the non-existence of some other
// domain? Hopefully not! Answering `Refused` would side-step this question.
expect_no_records_error_code(&lookup_err, ResponseCode::ServFail);
the behavior now is pretty simple and intentional, this is from dns-server/src/dns_server.rs
:
impl From<QueryError> for RequestError {
fn from(source: QueryError) -> Self {
match &source {
QueryError::NoName(_) => RequestError::NxDomain(source),
// Bail with servfail when this query is for a zone that we don't
// own (and other server-side failures) so that resolvers will look
// to other DNS servers for this query.
QueryError::NoZone(_)
| QueryError::QueryFail(_)
| QueryError::ParseFail(_) => RequestError::ServFail(source.into()),
}
}
}
from reading RFC 1035 and related (though i'm not certain that no later RFC refines it), "Server failure" means that there was an issue with the name server in processing a request. we obviously should not return "Name Error" (aka NXDOMAIN), as we don't know if a zone we're not authoritative for does not exist. this suggests to me that a more accurate error would be to return Refused
, since we can neither say there are nor are not records for the queried domain. as it turns out, this also the response i get from dig a foo. $nameserver
against a dozen nameservers i tried.
realistically i think this is unlikely to matter. if we're getting queries for zones we don't know about, something else is pretty horked. but i could imagine some intermediary seeing SERVFAIL
as an implication that the server(s) are unhealthy for other queries that we're perfectly happy to answer. (and yes, i do wonder if we should better justify SERVFAIL
for QueryFail
and ParseFail
errors. answers for queries like dig a $(printf "hello\x7fworld.$zone") ns1.$zone
get a mix of NXDOMAIN and REFUSED, for comparison)