Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind9 named server interaction with block_ipv6 activated resulting in SERVFAIL queries #2716

Open
Daryes opened this issue Oct 26, 2024 · 5 comments

Comments

@Daryes
Copy link

Daryes commented Oct 26, 2024

It is probably the same as the reddit discussion with blacklist

Output of the following commands:

# /usr/local/bin/dnscrypt-proxy --config /etc/dnscrypt-proxy/dnscrypt-proxy.toml -version
2.1.4

# /usr/local/bin/dnscrypt-proxy --config /etc/dnscrypt-proxy/dnscrypt-proxy.toml -check
[2024-10-26 20:14:58] [NOTICE] dnscrypt-proxy 2.1.4
[2024-10-26 20:14:58] [NOTICE] Configuration successfully checked

# /usr/local/bin/dnscrypt-proxy --config /etc/dnscrypt-proxy/dnscrypt-proxy.toml -resolve example.com
Resolving [example.com] using 127.0.2.1 port 53

Resolver      : **sensitive**

Canonical name: example.com.

IPv4 addresses: 93.184.215.14
IPv6 addresses: -

Name servers  : b.iana-servers.net., a.iana-servers.net.
DNSSEC signed : yes
Mail servers  : 1 mail servers found

HTTPS alias   : -
HTTPS info    : -

Host info     : -
TXT records   : wgyf8z8cgvm2qmxpnbnldrcltvk4xqfn, v=spf1 -all

What is affected by this bug?

All IPv6 queries or explicit IPv4+v6 dns record queries are ending in SERVFAIL

When does this occur?

Always, when block_ipv6=true is set in dnscrypt-proxy configuration with a Bind9 dns server in front, dnscrypt-proxy being used as a forwarder.

How do we replicate the issue?

I'm currently having a Bind9 dns servers for the internal network, forwarding external queries to a dnscrypt-proxy server.
IPv6 is not used on the network, thus having it blocked by dnscrypt-proxy.
This usually is working well, but it seems there are more and more applications explicitly querying for both A and AAAA records.

That's when I found that such queries were returned with the SERVFAIL status when coming from dnscrypt-proxy through bind9.

When investigating, I found the following when querying for a AAAA record :

  1. If block_ipv6 = true, querying directly dnscrypt-proxy the status is NOERROR.
  2. If block_ipv6 is set to false, querying through bind9 give an answser with the status NOERROR .
  3. with another dns server (Windows), chained through bind9, it returns a correct NOERROR status for a AAAA query on an non-existing AAAA record, but a A record (the server does not contains a single IPv6 record)

It seems the generated HINT answer from dnscrypt-proxy when blocking ipv6 records is accepted by most clients, but not bind.

When block_ipv6 is active, using "dig" with a direct query to dnscryp-proxy, and one to the windows server, it appeared there's a difference in the answer flags :

  • Windows dns : Flags: qr aa rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 1; ADDITIONAL: 0
  • dnscrypt-proxy : Flags: qr ra; QUERY: 1; ANSWER: 1; AUTHORITY: 1; ADDITIONAL: 0

dnscrypt-proxy is missing these flags in its response : aa (authoritative) and rd (recursion desired) in the HINT generated answer.

So I did some packet capture to confirm this and decode the header content with wireshark :
Windows DNS answer header

Flags: 0x8580 (Standard query response, No error)
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .1.. .... .... = Authoritative: Server is an authority for domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...1 .... .... = Recursion desired: Do query recursively
.... .... 1... .... = Recursion available: Server can do recursive queries
.... .... .0.. .... = Z: reserved (0)
.... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
.... .... ...0 .... = Non-authenticated data: Unacceptable
.... .... .... 0000 = Reply code: No error (0)
Questions: 1
Answer RRs: 0
Authority RRs: 1
Additional RRs: 0

dnscrypt-proxy answer header

Flags: 0x8080 (Standard query response, No error)
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .0.. .... .... = Authoritative: Server is not an authority for domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...0 .... .... = Recursion desired: Don't do query recursively
.... .... 1... .... = Recursion available: Server can do recursive queries
.... .... .0.. .... = Z: reserved (0)
.... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
.... .... ...0 .... = Non-authenticated data: Unacceptable
.... .... .... 0000 = Reply code: No error (0)
Questions: 1
Answer RRs: 1
Authority RRs: 1
Additional RRs: 0

Both flags are indeed missing from dnscrypt-proxy answer

Expected behavior (i.e. solution)

I suspect that as Bind9 has only dnscrypt-proxy as the only forwarder, it will force the status to failed because the AAAA answer is absent, no recursion was done and the server is not authoritative. And I didn't find in the RFC what is the correct behavior here, if there's one.

But I'm not seeing how to alter the dns.HINFO construct in the block_ipv6 plugin of dnscrypt-proxy, to add the "authoritative" and "Recursion desired" flags to check if Bind9 is happy with these.

That's the last part I am looking for, I can run a test build if required.

@lifenjoiner
Copy link
Member

Tried on Win10, but no SERVFAIL.

BIND 9.16.28 (Extended Support Version)
dnscrypt-proxy 2.1.5

dig -p 5353 -t aaaa github.com

; <<>> DiG 9.16.28 <<>> -p 5353 -t aaaa github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 32923
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: fb125caf5feb003f01000000671d9e35116dcf6274df268c (good)
;; QUESTION SECTION:
;github.com.                    IN      AAAA

;; Query time: 1207 msec
;; SERVER: 127.0.0.1#5353(127.0.0.1)
;; WHEN: Sun Oct 27 19:08:13 2024
;; MSG SIZE  rcvd: 67

==>

27-Oct-2024 19:08:13.475 DNS format error from 127.0.0.1#33053 resolving github.com/AAAA for 127.0.0.1#64922: reply has no answer
27-Oct-2024 19:08:13.475 FORMERR resolving 'github.com/AAAA/IN': 127.0.0.1#33053

==>

[2024-10-27 19:05:40] [NOTICE] dnscrypt-proxy 2.1.5
[2024-10-27 19:08:13]   127.0.0.1       github.com      AAAA    SYNTH   0ms     -

The error message in bind9 should not matter. Dig can receive the answer.
If you really want to eliminate it, tweak blocked_query_response as the link you listed says.

@Daryes
Copy link
Author

Daryes commented Oct 28, 2024

Win10 has no play in here. And the problem is not if dig is able to receive the answer or not.
Beside, you have it in your own logs the dig query returns SERVFAIL.

That's were lies the problem, and I was explaining with this line :

This usually is working well, but it seems there are more and more applications explicitly querying for both A and AAAA records.

To get more into the detail, one of the root source is the MUSL libc, replacing GLIBC and used on alpine distributions or anything else requiring a lightweight system.
Musl libc has less tuning possibilities than glibc for dns resolution, and does both a A and AAAA query simultaneously, and will return an error with Bind chained to dnscrypt + ipv6 address blocked, due to bind rewriting the NOERROR in SERVFAIL.

Try any alpine image on docker, it will occurs immediately with a simple nslookup www.github.com ; echo $? in the said situation.
This is known, documented and causing major headache on K8s with alpine images.

And as said, more and more applications are using a system backend, if not themselves, executing a simultaneous A + AAAA query.
Which, if it is not MUSL libc itself, are following the same assumption : expecting a NOERROR anwser.
Having instead a SERVFAIL answer will fail the whole query, even if one A record (or more) was correctly provided.

And for your hint to tweak blocked_query_response , even if set, it has no effect : the block_ipv6 plugin has the HINFO answer hardcoded.

@jedisct1
Copy link
Member

That's too much to read, but if what you are saying is that synthetic responses should always have the RA flag, and keep the RD flag of the query like non-synthetic responses, try 8d43ebf

@lifenjoiner
Copy link
Member

Thought it was SERVFAIL in dnscrypt-proxy.

due to bind rewriting the NOERROR in SERVFAIL

Seems you already know where the problem is. It is bind9, SERVFAIL in/from bind9.

Try blocked_query_response = 'a:0.0.0.0,aaaa:::'. It does suppress bind9 modifing reply code to SERVFAIL.

Bind9 is a recursive server, but dnscrypt-proxy is not. Setting dnscrypt-proxy as the upstream of bind9 does not seem to be a good practice.

@Daryes
Copy link
Author

Daryes commented Nov 4, 2024

That's too much to read, but if what you are saying is that synthetic responses should always have the RA flag, and keep the RD flag of the query like non-synthetic responses, try 8d43ebf

Many thanks, @jedisct1
I've finally pinpointed the source of the problem, and Bind is happy now

It helped me to tinker and find the real cause : the HINFO data is not expected by Bind.
Bind seems to expect either an answer with a AAAA record, or a response with no answer section at all. Not just empty, but completely missing.
I could validate this with the Windows dns server, the returned packet for an non-existing AAAA record has no anwsers section.

By only commenting this line in plugin_block_ipv6.go : synth.Answer = ... I could get Bind keeping the NOERROR status, instead of being rewritten to SERVFAIL.
Standard clients are more basic, hence why they are still happy with a HINFO answer.

Now the bad news is that I was mistaken with the flags, and your latest commit to change the EmptyResponseFromMessage has no effect, I tested with and without.

If I may give a suggestion, aside reverting the EmptyResponse commit which has no effect :
option 1 :
add a new parameter like block_ipv6_no_hint_message_as_forwarder or anything that you would prefer, set to false as default,
This parameter used as a conditional for the hinfo := new(dns.HINFO) ( . . . ) synth.Answer = []dns.RR{hinfo} block.

This way, the HINFO is still activated for everybody, and those having a problem due to a dns server paying a little to much attention in front of dnscrypt-proxy can fall back to a "correct" message.

option 2 :
An alternative would be to implement the support of blocked_query_response in the block_ipv6 plugin, which currently ignores it.
With a new option : "empty" used for returning a response without an "answers" section.
This should also fix the similar problem for the blocklist ip plugin when used as a forwarder by a Bind server.


@lifenjoiner
DoT and DoT are slowly implemented in Bind. You need v9.19.10 for both, which is not yet available in the distros, they are proposing only v9.18.x (aside SID/unstable). So good practice or not, there was no much choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants