Posted by Matthew Maurer and Mike Yu, Android team
To help keep Android users’ DNS queries private, Android supports encrypted
DNS. In addition to existing support for DNS-over-TLS, Android now supports
DNS-over-HTTP/3 which has a number of improvements over DNS-over-TLS.
Most network connections begin with a DNS lookup. While transport security may
be applied to the connection itself, that DNS lookup has traditionally not
been private by default: the base DNS protocol is raw UDP with no encryption.
While the internet has migrated to TLS over time, DNS has a bootstrapping
problem. Certificate verification relies on the domain of the other party,
which requires either DNS itself, or moves the problem to DHCP (which may be
maliciously controlled). This issue is mitigated by central resolvers like
Google, Cloudflare, OpenDNS and Quad9, which allow devices to configure a
single DNS resolver locally for every network, overriding what is offered
through DHCP.
In Android 9.0, we
announced
the Private DNS feature, which uses
DNS-over-TLS (DoT) to
protect DNS queries when enabled and supported by the server. Unfortunately,
DoT incurs overhead for every DNS request. An alternative encrypted DNS
protocol,
DNS-over-HTTPS (DoH), is
rapidly gaining traction within the industry as DoH has already been deployed
by most public DNS operators, including the
Cloudflare Resolver
and
Google Public DNS . While using HTTPS alone will not reduce the overhead significantly, HTTP/3
uses QUIC , a
transport that efficiently multiplexes multiple streams over UDP using a
single TLS session with session resumption. All of these features are crucial
to efficient operation on mobile devices.
DNS-over-HTTP/3 (DoH3) support was released as part of a
Google Play system update , so by the time you’re reading this, Android devices from Android 11
onwards1 will use
DoH3 instead of DoT for well-known2
DNS servers which support it. Which DNS service you are using is unaffected by
this change; only the transport will be upgraded. In the future, we aim to
support
DDR which
will allow us to dynamically select the correct configuration for any server.
This feature should decrease the performance impact of encrypted DNS.
Performance
DNS-over-HTTP/3 avoids several problems that can occur with DNS-over-TLS
operation:
As DoT operates on a single stream of requests and responses,
many
server implementations suffer from
head-of-line blocking 3 . This means that if the request at the front of the line takes a while to
resolve (possibly because a recursive resolution is necessary), responses
for subsequent requests that would have otherwise been resolved quickly are
blocked waiting on that first request. DoH3 by comparison runs each request
over a separate
logical stream , which means implementations will resolve requests out-of-order by
default.
Mobile devices change networks frequently as the user moves around. With
DoT, these events require a full renegotiation of the connection. By
contrast, the QUIC transport HTTP/3 is based on can resume a suspended
connection in a single RTT.
DoT intends for many queries to use the same connection to amortize the cost
of TCP and TLS handshakes at the start. Unfortunately, in practice several
factors (such as network disconnects or server TCP connection management)
make these connections less long-lived than we might like. Once a connection
is closed, establishing the connection again requires at least 1 RTT.
In unreliable networks, DoH3 may even outperform traditional DNS. While
unintuitive, this is because the flow control mechanisms in QUIC can alert
either party that packets weren’t received. In traditional DNS, the
timeout for a query needs to be based on expected time for the entire
query, not just for the resolver to receive the packet.
Field measurements during the initial limited rollout of this feature show
that DoH3 significantly improves on DoT’s performance. For successful
queries, our studies showed that replacing DoT with DoH3 reduces median
query time by 24%, and 95th percentile query time by 44%. While it might
seem suspect that the reported data is conditioned on successful queries,
both DoT and DoH3 resolve 97% of queries successfully, so their metrics
are directly comparable. UDP resolves only 83% of queries successfully. As
a result, UDP latency is not directly comparable to TLS/HTTP3 latency
because non-connection-oriented protocols have a different notion of what
a "query" is. We have still included it for rough comparison.
Memory Safety
The DNS resolver processes input that could potentially be controlled by
an attacker, both from the network and from apps on the device. To reduce
the risk of security vulnerabilities, we chose to use a memory safe
language for the implementation.
Fortunately, we’ve been adding
Rust support
to the Android platform. This effort is intended exactly for cases like
this — system level features which need to be performant or low level
(both in this case) and which would carry risk to implement in C++. While
we’ve previously launched Keystore 2.0, this represents our first foray
into Rust in Mainline Modules. Cloudflare maintains an HTTP/3 library
called quiche , which
fits our use case well, as it has a memory-safe implementation, few
dependencies, and a small code size. Quiche also
supports use directly from C++ . We considered this, but even the request dispatching service had
sufficient complexity that we chose to implement that portion in Rust as
well.
We built the query engine using the
Tokio async framework to
simultaneously handle new requests, incoming packet events, control
signals, and timers. In C++, this would likely have required multiple
threads or a carefully crafted event loop. By leveraging asynchronous in
Rust, this occurs on a single thread with minimal locking4 . The DoH3 implementation is 1,640 lines and uses a single runtime
thread. By comparison, DoT takes 1,680 lines while managing less and using
up to 4 threads per DoT server in use.
Safety and Performance — Together at Last
With the introduction of Rust, we are able to improve both security and
the performance at the same time. Likewise, QUIC allows us to improve
network performance and privacy simultaneously. Finally, Mainline ensures
that such improvements are able to make their way to more Android users
sooner.
Acknowledgements
Special thanks to Luke Huang who greatly contributed to the development of
this feature, and Lorenzo Colitti for his in-depth review of the technical
aspects of this post.