-
Notifications
You must be signed in to change notification settings - Fork 43
DNS servers should have NS and SOA records #8047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
plus rustfmt, clippy
842455b
to
f349290
Compare
f349290
to
fa47ab1
Compare
Cargo.toml
Outdated
[patch.crates-io] | ||
progenitor = { git = "https://github.com/oxidecomputer/progenitor", rev = "e4af3302c20e35dff6ceafc61e0175739922c132" } | ||
progenitor-client = { git = "https://github.com/oxidecomputer/progenitor", rev = "e4af3302c20e35dff6ceafc61e0175739922c132" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm going to go plumb the Progenitor bump separately, i don't want to make Cargo.toml changes here (moving DNS servers to versioned APIs is noisy enough!)
please disregard Cargo.toml changes from your eyes and i'll promise to not merge this until this (and Cargo.lock) are unchanged 😇
/// Perform a *lossy* conversion from the V2 [`DnsConfig`] to the V1 | ||
/// [`v1::config::DnsConfig`]. In particular, V2 adds NS and SOA records, | ||
/// which did not exist in V1, so they are silently discarded when | ||
/// converting down. | ||
/// | ||
/// If this conversion would leave an empty zone, the zone is omitted | ||
/// entirely. | ||
pub fn as_v1(self) -> v1::config::DnsConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so as it turns out, there is one place where we do a get records -> update -> put records
pattern, and that is dnsadm
. that means that using a dnsadm
that uses the V1 API, currently, could talk to a V2 DNS server, get some records, experience this lossy conversion, add a new record, and then put a new set of mangled records back to the server.
in practice i don't think this is an issue: the only records that would be lost are NS and SOA, which are only on the zone apex. since those are also the only records at the apex, any time a V1 client gets records from a V2 server the @
records would be omitted (since the lossy conversion would filter all records), so for this cross-version example to actually modify records the V1 client doesn't know about you'd have to add an A or AAAA or SRV record to @
.
in reality, i think that this use of dnsadm
is very rare. i've ~never seen it come up! i've used it but not to alter records. so the above seems like low risk, and why i didn't add support for NS and SOA record management to it either.
this is the biggest wrinkle of the DNS API version bump and if there's objections to my approach here, please shout!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how hard this would be to add but would it make sense to disallow a PUT with v1 when any v2 features have been used already (i.e., there are SOA or NS records)? I wonder if we even want to disallow GETs. The main reason to support both versions is for the intermediate state but once anything has started using v2 we don't expect to go backwards.
I don't think this is a big deal because this is mainly a development tool and it probably hasn't been used in ages anyway.
impl From<Srv> for DnsRecord { | ||
fn from(srv: Srv) -> Self { | ||
DnsRecord::Srv(srv) | ||
} | ||
} | ||
|
||
#[derive( | ||
Clone, | ||
Debug, | ||
Serialize, | ||
Deserialize, | ||
JsonSchema, | ||
PartialEq, | ||
Eq, | ||
PartialOrd, | ||
Ord, | ||
)] | ||
pub struct Srv { | ||
pub prio: u16, | ||
pub weight: u16, | ||
pub port: u16, | ||
pub target: String, | ||
} | ||
|
||
impl From<v1::config::Srv> for Srv { | ||
fn from(other: v1::config::Srv) -> Self { | ||
Srv { | ||
prio: other.prio, | ||
weight: other.weight, | ||
port: other.port, | ||
target: other.target, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other option here is to use the v1::config::Srv
type directly in v2, because it really has not changed. weaving the V1/V2 types together seems more difficult to think about generally, but i'm very open to the duplication being more confusing if folks feel that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably use the v1 types directly but I can see going either way.
@@ -3582,7 +3582,7 @@ | |||
] | |||
}, | |||
"RotImageError": { | |||
"description": "RotImageError\n\n<details><summary>JSON schema</summary>\n\n```json { \"type\": \"string\", \"enum\": [ \"unchecked\", \"first_page_erased\", \"partially_programmed\", \"invalid_length\", \"header_not_programmed\", \"bootloader_too_small\", \"bad_magic\", \"header_image_size\", \"unaligned_length\", \"unsupported_type\", \"reset_vector_not_thumb2\", \"reset_vector\", \"signature\" ] } ``` </details>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm pretty confused about the backticks showing up in this file!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to look at this closely.
/// Perform a *lossy* conversion from the V2 [`DnsConfig`] to the V1 | ||
/// [`v1::config::DnsConfig`]. In particular, V2 adds NS and SOA records, | ||
/// which did not exist in V1, so they are silently discarded when | ||
/// converting down. | ||
/// | ||
/// If this conversion would leave an empty zone, the zone is omitted | ||
/// entirely. | ||
pub fn as_v1(self) -> v1::config::DnsConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how hard this would be to add but would it make sense to disallow a PUT with v1 when any v2 features have been used already (i.e., there are SOA or NS records)? I wonder if we even want to disallow GETs. The main reason to support both versions is for the intermediate state but once anything has started using v2 we don't expect to go backwards.
I don't think this is a big deal because this is mainly a development tool and it probably hasn't been used in ages anyway.
impl From<Srv> for DnsRecord { | ||
fn from(srv: Srv) -> Self { | ||
DnsRecord::Srv(srv) | ||
} | ||
} | ||
|
||
#[derive( | ||
Clone, | ||
Debug, | ||
Serialize, | ||
Deserialize, | ||
JsonSchema, | ||
PartialEq, | ||
Eq, | ||
PartialOrd, | ||
Ord, | ||
)] | ||
pub struct Srv { | ||
pub prio: u16, | ||
pub weight: u16, | ||
pub port: u16, | ||
pub target: String, | ||
} | ||
|
||
impl From<v1::config::Srv> for Srv { | ||
fn from(other: v1::config::Srv) -> Self { | ||
Srv { | ||
prio: other.prio, | ||
weight: other.weight, | ||
port: other.port, | ||
target: other.target, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably use the v1 types directly but I can see going either way.
Co-authored-by: David Pacheco <[email protected]>
@@ -284,30 +335,38 @@ async fn handle_dns_message( | |||
(RecordType::A, DnsRecord::A(_)) => true, | |||
(RecordType::AAAA, DnsRecord::Aaaa(_)) => true, | |||
(RecordType::SRV, DnsRecord::Srv(_)) => true, | |||
(RecordType::NS, DnsRecord::Ns(_)) => true, | |||
(RecordType::SOA, DnsRecord::Soa(_)) => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one interesting detail here: RFC 1034 describes negative caching as returning an SOA in an authoritative name error, where the minimum
field of the SOA is used as the TTL for the negative result. that's probably one of the more common ways you'd run into an SOA record if you're not operating nameservers yourself. this only returns an SOA record if you've queried about an SOA record.
in the negative caching case we'd want to return this SOA record in the soa
* list provided to MessageResponseBuilder
. somewhat different kind of plumbing, but it'd be relatively straightforward after this.
* technically RFC 1034 says "may add an SOA RR to the additional [...]", but it's wrong. this is clarified in 2181 7.1.
// authoritative for zones. SOA records are in the API types here | ||
// because they are included when reporting this server's records (such | ||
// as via `dns_config_get()`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean you cannot GET
the config and PUT
it right back? That doesn't seem right. (edit: more below.)
// | ||
// Assuming one generation bump every minute, this overflow | ||
// would affect operations after 8,171 years. | ||
let soa_serial = config.generation.as_u64() as u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great comment, but why not use a try_from
anyway and produce a 500 if this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the generation goes above u32::MAX
we would always 500 on any DNS update (until the generation itself wraps u64
). there's nothing that would reset the DNS generation after initial rack setup, right? so even attempting to update DNS records as part of an update would fail - it'd be pretty difficult for a customer to get out of that situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the comment to be arguing "this is never going to happen in practice", which I agree with. If somehow it did happen, I'd rather it fail explicitly and in a way that's debuggable (e.g., an explicit error in a log somewhere) rather than implicitly and non-fatally (e.g., serial number goes backwards; downstream clients fail to see any subsequent changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so.. i'd think it's very unlikely to happen in practice, but not impossible - if there was haywire automation against a rack that created and destroyed silos in a loop, the threshold here is more like.. "happens in 25 years"? i don't think there's an operation that would increment the generation number in under ~200ms, so (mostly joking) it almost makes sense to return a 500 if the year is lower than 2045 and this condition occurs, roll over otherwise 🙃
i otherwise agree that failing explicitly would be better to discover and debug. logging every time the rollover occurs (e.g. config.generation.as_u64() > 0 && config.generation.as_u64() as u32 == 0
) would be a little better, what do you think? am i weighting the haywire automation risk too high?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that a log message would be very easy to miss. What I'm thinking with the 500 is: someone will notice immediately when this happens and they can follow the failing request directly to the problem.
That said: maybe the better thing here is to change the OpenAPI spec so that the generation is a u32
. Most importantly:
- we'd validate that it's in range during PUT/GET
- code using the Progenitor client couldn't possibly set a value out of range
This should force the range problem to be dealt with much earlier, at the point where we try to bump the DNS generation number in the first place. That code already has to deal with a range error (whether it does or not, I don't know) -- we're just shrinking that range. That'll be in Nexus, too, so we'll be in a better position to surface that as a fault, once we have richer support for fault reporting.
We could do this in a separate PR. That might also need to change the column of the dns_version
table -- I'm not sure. Normally I'd be worried about this sort of change breaking deployed systems, but I really don't think any deployed systems have values large enough for this to be a problem and I think we could just make this change without worrying about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also don't need to block this PR on fixing that. For example, you could do whichever of these in this PR and fix the API in a follow-on PR.
If you do go with rollover, though, beware that the DNS server might not see every generation number so if you just check for generation as u32 == 0
, you would miss some rollovers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nexus is definitely where i'd prefer handling the generation number overflowing too, i like that thought a lot. i'll go with making this a 500 in the DNS server in the mean time. that error path will either go away or move to Nexus in the follow up so it wouldn't be a 30-to-8,171-year timebomb in the same way.
// Assuming one generation bump every minute, this overflow | ||
// would affect operations after 8,171 years. | ||
let soa_serial = config.generation.as_u64() as u32; | ||
apex_records.push(DnsRecord::Soa( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see -- so, why store this at all in the database? What about synthesizing it when we load the config from the database? That will avoid the problem I mentioned above (round-trip GET + PUT doesn't work) and could also future-proof us if we decide to change anything about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you say "store this at all in the database", you mean, why store the SOA record on disk at all? i do like that on the basis of avoiding the "round-trip doesn't work" problem you noted. though in the same way i don't love the SOA record not existing in CRDB, i don't love the SOA record being entirely synthetic. it's invisible to omdb
, it'd be invisible to dnsadm
. if support bundles included DNS records, SOA would still be absent!
maybe the thing to do here is create an SOA if one isn't present, and if one is present just replace the serial with config.generation
. that corrects the round-trip-doesn't-work issue and means we can debug with more custom SOA records if server's defaults are no good for some environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I get into the PR, the more strongly I feel like we shouldn't store the SOA record into the DNS database (or, at least, shouldn't expose it back out the API).
though in the same way i don't love the SOA record not existing in CRDB, i don't love the SOA record being entirely synthetic. it's invisible to omdb, it'd be invisible to dnsadm. if support bundles included DNS records, SOA would still be absent!
All things being equal, I agree. But there's no reason omdb and support bundles can't also report the entire DNS contents (fetched via DNS queries) -- that'd be a better representation of what clients see anyway. So I don't think that should be a constraint on the solution here.
maybe the thing to do here is create an SOA if one isn't present, and if one is present just replace the serial with config.generation. that corrects the round-trip-doesn't-work issue and means we can debug with more custom SOA records if server's defaults are no good for some environment.
This has the opposite round-trip problem: Nexus might PUT configuration C1 with generation number g1, but if it GETs it back, it will find a different configuration C2 with generation number g1. That's a problem on its own (it violates the constraint that if the DnsConfig
changes, the generation number must change). (And the DNS server can't bump g1 because it doesn't own that -- that would fork the linear history.) This also introduces a discrepancy between what's in CockroachDB and what's in the DNS config. The database wouldn't have the SOA record, but the actual config would. Worse, if Nexus then wants to create configuration C3 (immediately following C1, as far as it's concerned) with generation number g2, how would the DNS server interpret receiving a request that, relative to the previous generation, is missing the SOA? Is Nexus trying to remove it altogether or did it just not know about it?
Stepping back (sorry I'm sure this restates stuff you know but it's been helpful for me to write this down), the way I'm thinking about this is: right now, there are 1-1 relationships between all of these:
- the contents of the DNS-related database tables
- the contents written to the DNS PUT API
- the contents returned by the DNS GET API
- the contents of the DNS server's database
- the DNS records served by the DNS server
(Right? It's been a while since I've been in this code.)
and there are two other core design principles:
- the abstraction provided by the DNS server is that it's configurable with a monotonically-increasing generation number -- it serves exactly what you configure it with; and
- changes to the DNS config always originate in Nexus, flow to the database, then the DNS servers (via the PUT API), which update their local storage and the external API
All of this keeps things really simple: Nexus is always free to make any changes it likes, and all these flows are pretty trivial because they never do any conflict resolution, they never silently change any values, or ignore any values, etc -- they just take the data they're given and write to the next thing in the flow. This is important because in the end we have a lot of copies of this data across a few different representations, so if any layer is transforming it (semantically), it's a lot harder to reason about and be sure it's all working right.
It's problematic for any point in this flow to create new DNS data that gets exposed to earlier steps in the flow. You can sort of pick your poison but one way or another this gets really messy. We've seen that with the GET/PUT round-trip problem, the different PUT/GET round-trip problem, the problem where the DNS server can't distinguish between an intentional change made by Nexus vs. a case where Nexus just didn't know about some data that it had made up, etc.
I'd claim that it's fine though for any point in this process to make up DNS data that it doesn't expose to earlier in the process. That doesn't introduce any of these problems.
This might be confusing but I imagine us getting to this design like this:
- we start by saying all of the config is specified by the DNS API client
- then we say: we want to add SOA records, but we want the serial number to be filled in implicitly by the generation number because otherwise we're creating a huge pain for clients to try to manage this on their own
- so you could imagine a design where the client does specify an SOA record, but it just doesn't have a serial_number field -- that's implicitly filled in
- This is what I imagine we would do if we were fully fleshing this out, namely if we ever wanted the contents of the SOA to be configurable by Nexus.
- But since we always want exactly one SOA and don't need its contents to be configurable, we didn't bother doing the work to plumb it into CockroachDB, the DNS API, or the DNS storage layer.
If that's too weird, I'd almost rather do the work to put the SOA sans serial number into all those places and have Nexus "fully manage it" (which just means making sure there's always exactly one). I think that'd be a sound approach, just more work than we need to do right now.
The last thing I wanted to mention is future-proofing. Right now, we don't allow the SOA contents to be configured at all and that's fine. If it's completely synthetic, then in the future, we could still allow parts of it to be configured by the API client. If not specified by the API client, we still know how to make up synthetic values. At that point, though, if we're also storing our made-up version in the database, then we won't be able to distinguish an SOA configured by the client from one synthesized by the server. That kind of sucks if we decide later to, say, change the synthesized contents. This might all sound contrived but all I mean to say is that by storing this into the DNS database, we're losing the ability to distinguish "this was written by a client who cares about these values" vs. "the client doesn't care and we should use our defaults [which could change over time]".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the opposite round-trip problem: ...
yeah, that's a lot thornier than i'd initially considered. and
It's problematic for any point in this flow to create new DNS data that gets exposed to earlier steps in the flow
i hadn't really thought about it as a unidirectional flow, but i see the reasoning. so, i'm convinced that the SOA record really should not be returned through the DNS server's GET.
it probably doesn't make sense to synthesize a SOA record when reading the config, because GETing the config also reads the config in the same way, so we'd have to filter the SOA back out there. i think it works out well as a case in query_raw
, where if you query for records at the apex we'll create an additional SOA record on top of whatever's defined in the database.
This might all sound contrived
less than you might think :) i've made basically these same arguments before, i had just assumed the only values we might want to configure here really are the administrative contact at some point in the future.
// We'll only have external DNS nameserver records - the A/AAAA records | ||
// for servers themselves, and NS records at the apex. | ||
let baseline_external_dns_names = external_dns_count + 1; | ||
assert_eq!( | ||
external_dns_zone.records.len(), | ||
baseline_external_dns_names | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm following this right, this is asserting that there are 4 external DNS zone records. I don't follow why that's right, though -- don't we have an A record and NS record for each of the 3 servers, so 6 records altogether?
Edit: Oh, is records.len()
the number of different names that have any records at all? So it's: the apex (which has the 3 NS records all at one name) + each of the three servers (each with a different name)? If that's it maybe the comment could be a little more explicit. "Although there will be 2 records for each external DNS server, all the NS records will be under one name, so there are only N + 1 entries in the records
map."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was pretty confused at first here too, for the same reason. i'd tried writing exactly that assert_eq!(...len(), 6)
even! i'll change .records
to .names
in the v2 type and i think that will make it more legible.
@@ -206,20 +206,38 @@ impl super::Nexus { | |||
); | |||
|
|||
let silo_name = &request.recovery_silo.silo_name; | |||
let dns_records = request | |||
// Records that should be present at the rack-internal zone apex - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important to know that this code path only affects the contents of the first internal DNS zone for newly-initialized racks. None of this will run for deployed racks. I assume that's fine?
(We probably should have an end-to-end test that the internal DNS contents doesn't change after the initial blueprint is executed. We have a similar end-to-end test that generating a new blueprint from the initial one makes no changes.)
Relatedly: should we be using blueprint_internal_dns_config()
like you're doing with the external zone below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this will run for deployed racks. I assume that's fine?
yeah, the changes in rack.rs
are really just in service of minimizing the delta between state just after RSS and what the initial blueprint describes. except that as you noted below, the records don't actually get added to the internal update, which is a bug.
Relatedly: should we be using
blueprint_internal_dns_config()
like you're doing with the external zone below?
probably! i didn't make that change here because the initial internal DNS records are created by sled-agent, and i'd want to either compare those more closely. or, have an end-to-end test like you mention that checks that the records as created by sled-agent are the same as the initial blueprint.
@@ -164,16 +170,36 @@ pub fn blueprint_external_dns_config<'a>( | |||
external_dns_zone_name: String, | |||
) -> DnsConfigZone { | |||
let nexus_external_ips = blueprint_nexus_external_ips(blueprint); | |||
let dns_external_ips = blueprint_external_dns_resolver_ips(blueprint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(can't put this comment where I want it, which is blueprint_internal_dns_config)
Why don't we have to make a similar change to blueprint_internal_dns_config()
?
Relatedly: if we don't already, we should make sure we have tests that the NS records look correct even after we perform reconfigurator activities (e.g., expunge and create a new internal or external DNS zone). This might be easiest to do as a reconfigurator-cli test, where you can just write some reconfigurator-cli commands to do that and verify the expectorate output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we have to make a similar change to
blueprint_internal_dns_config()
?
well, we do. when i'd set this up on my workstation to see it all work, i'd missed this because i'd had the internal DNS update plumbed correctly. in the test you describe, internal NS
records would not get updated to track actual internal DNS zones. i'll take a look at the reconfigurator-cli tests and do something here.
.into_iter() | ||
.map(|addr| match addr { | ||
IpAddr::V4(addr) => DnsRecord::A(addr), | ||
IpAddr::V6(addr) => DnsRecord::Aaaa(addr), | ||
}) | ||
.collect(); | ||
|
||
let records = silos | ||
let mut zone_records: Vec<DnsRecord> = Vec::new(); | ||
let external_dns_records: Vec<(String, Vec<DnsRecord>)> = dns_external_ips |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we sort these for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to, so i'm not sure we'd want to. it seems a bit awkward that lower-numbered ns{}
records would correlate with lower-number IP addresses, and that would mean that SOA records are biased towards the DNS server with the lowest IP.
maybe that's fine though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I'm trying to avoid is having the contents of NS records change because of some non-determinism here, causing either blueprint or DNS generation bumps under normal conditions even when there's been no changes. I think that's important to avoid but any deterministic order seems fine.
|
||
/// Return the addresses on which this blueprint's external DNS servers listen | ||
/// for DNS queries. | ||
pub fn blueprint_external_dns_resolver_ips( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn blueprint_external_dns_resolver_ips( | |
pub fn blueprint_external_dns_nameserver_ips( |
(a lot of stuff uses these interchangeably but I believe resolvers are client-side components that initiate DNS queries and I find it useful to keep this distinction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're totally right, yes. i'd just picked a poor name here, i know better :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, no sweat! Honestly, I see these used interchangeably more often than I see them used distinctly.
@@ -20,6 +20,17 @@ pub const DNS_ZONE: &str = "control-plane.oxide.internal"; | |||
/// development | |||
pub const DNS_ZONE_EXTERNAL_TESTING: &str = "oxide-dev.test"; | |||
|
|||
/// Label for records associated with a zone itself, rather than any names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only used as a key into the records
map?
} | ||
|
||
impl DnsConfigZone { | ||
fn as_v1(self) -> v1::config::DnsConfigZone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed elsewhere... I do wonder if we should fail this request if we have any NS records. We definitely don't want somebody to fetch this, make some other change, then PUT it back without the NS records and not realize they deleted them.
i've marked this "ready", but i'm not merging this as-is - the
progenitor
patch here is not suitable and i'm going to go bumpprogenitor
like a normal person first, and maybe even not have to touchprogenitor
versions in this change at all. that might bring the delta back down to below +2000...aside from Progenitor, this is definitely ready for eyes!
this is probably the more exciting part of the issues outlined in #6944. the changes here get us to the point that for both internal and external DNS, we have:
ns1.<zone>
,ns2.<zone>
, ...)ns*.<zone>
described aboveoxide.internal
(for internal DNS) and$delegated_domain
(for external DNS)we do not support zone transfers here. i believe the SOA record here would be reasonable to guide zone transfers if we did, but obviously that's not something i've tested.
SOA fields
the SOA record's
RNAME
is hardcoded toadmin@<zone_name>
. this is out of expediency to provide something, but it's probably wrong most of the time. there's no way to get an MX record installed for<zone_name>
in the rack's external DNS servers, so barring DNS hijinks in the deployed environment, this will be a dead address. problems here are:it seems like the best answer here is to allow configuration of the rack's delegated domain and zone after initial setup, and being able to update an administrative email would fit in pretty naturally there. but we don't have that right now, so
admin@
it is. configuration of external DNS is probably more important in the context of zone transfers and permitting a list of remote addresses to whom we're willing to permit zone transfers. so it feels like this is in the API's future at some point.bonus
one minorly interesting observation along the way is that external DNS servers in particular are reachable at a few addresses - whichever public address they get in the rack's internal address range, and whichever address they get in the external address range. the public address is what's used for A/AAAA records. so, if you're looking around from inside a DNS zone you can get odd-looking answers like: