Skip to content

Probably should not return ServFail for unknown zones #8061

Open
@iximeow

Description

@iximeow

if our DNS servers get a query for an unknown zone, we return SERVFAIL. while working on #8047 i was writing a test that brushed up against this behavior, and i left some notes: https://github.com/oxidecomputer/omicron/pull/8047/files#diff-c0d9697cfa638ea3c770968ab4bd7fd28692d676e50055101339740a1688bd4aR415-R432

reproduced here in case future diffs or GitHub navigation breaks that link,

    let lookup_err = resolver
        .soa_lookup(TEST_ZONE)
        .await
        .expect_err("test zone should not exist");
    // I think we really should answer with ResponseCode::Refused. We are not
    // authoritative for the .internal TLD, so we don't know that some *other*
    // server would have records for `oxide.internal`. It is not a failure of
    // our server to not know what that domain is, we should just refuse to
    // answer.
    //
    // One may imagine we should return at least NXDomain without the
    // authoritative bit set. RFC 1035 says that "Name Error - Meaningful only
    // from an authoritative name server, ...". Does that mean that recursive
    // resolvers and clients would faithfully ignore our error in that case? Is
    // there a risk that something would miss the non-authoritative nature of
    // such an NXDomain and incorrectly cache the non-existence of some other
    // domain? Hopefully not! Answering `Refused` would side-step this question.
    expect_no_records_error_code(&lookup_err, ResponseCode::ServFail);

the behavior now is pretty simple and intentional, this is from dns-server/src/dns_server.rs:

impl From<QueryError> for RequestError {
    fn from(source: QueryError) -> Self {
        match &source {
            QueryError::NoName(_) => RequestError::NxDomain(source),
            // Bail with servfail when this query is for a zone that we don't
            // own (and other server-side failures) so that resolvers will look
            // to other DNS servers for this query.
            QueryError::NoZone(_)
            | QueryError::QueryFail(_)
            | QueryError::ParseFail(_) => RequestError::ServFail(source.into()),
        }
    }
}

from reading RFC 1035 and related (though i'm not certain that no later RFC refines it), "Server failure" means that there was an issue with the name server in processing a request. we obviously should not return "Name Error" (aka NXDOMAIN), as we don't know if a zone we're not authoritative for does not exist. this suggests to me that a more accurate error would be to return Refused, since we can neither say there are nor are not records for the queried domain. as it turns out, this also the response i get from dig a foo. $nameserver against a dozen nameservers i tried.

realistically i think this is unlikely to matter. if we're getting queries for zones we don't know about, something else is pretty horked. but i could imagine some intermediary seeing SERVFAIL as an implication that the server(s) are unhealthy for other queries that we're perfectly happy to answer. (and yes, i do wonder if we should better justify SERVFAIL for QueryFail and ParseFail errors. answers for queries like dig a $(printf "hello\x7fworld.$zone") ns1.$zone get a mix of NXDOMAIN and REFUSED, for comparison)

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueIssues that are good for learning the codebase

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions