Maybe It Shouldn’t Always Be DNS

It was DNS

DNS is one of the wonders of the modern world. Despite how you may feel about WebPKI or even DNSSEC, it’s truly a miracle that you can type https://example.com or http://very.large.horse into your browser and see a website on the other end. We’ve taken two notoriously hard problems - naming and cache invalidation - built a distributed system entirely out of those problems, and balanced the modern world on top. Shockingly, it (mostly) works.

Part of the success of the Domain Name System has been avoiding turning the protocol into an extendable database for arbitrary information. DNS records have to stay small so answers can make their way through all kinds of networks, be compatible with ancient clients, and be cached locally by software systems of dubious quality. Another huge part of its success has been making the system almost entirely coordination free - resolvers independently cache and spread out queries to reduce and distribute load.

There’s No Way It Should Have Been DNS

These are admirable qualities in a large, sprawling public system that is meant to be invisible. Inside a modern datacenter they make the whole thing a miserable foundation for developing and operating modern software.

The small and rigid resource record format is extremely limiting. Want to stop serving the addresses of unhealthy hosts? Heck, want to return more than 25 addresses in a response? You’re out of luck. DNS is so tightly tuned for transiting the sketchy parts of the public internet that getting an extra ounce of data, even if that data fits into an existing DNS resource, is like pulling teeth. If you try, I hope you like finding out which parts of your stack quietly truncate responses instead of upgrading from UDP to TCP, which parts are completely incapable of doing so, and which clients just pick the first address no matter what you try.

Aggressive decentralization is nice on the web, which is a bummer because centralization makes a system easier to manage. Want to debug a resolution failure? Monitoring DNS still mostly involves squinting at packet-loss graphs, and if you want to figure out why you got a particular answer or where it came from you’re usually stuck SSHing to individual hosts and running one-off queries. The best tool for debugging DNS is named after groping around in the dark because, on a good day, that’s a polite description of the whole experience. On a bad day, all of your host caches expire before you figure out what’s going on. Even if you’re trying to fix the problem as a software author things are bleak; not even libc exposes basic metadata like the TTL of the hostname you just looked up and there are deeply unpleasant surprises everywhere you look.

All of this is reflected in the tools available to actually do something with DNS.​​ Want reliable service discovery in a cloud environment where IPs change frequently? You’ll learn that “temporary failure in name resolution” can mean anything from “try again in 5 seconds” to “this host hasn’t existed for three days,” and your retry logic will need to account for both. Want to do split-horizon DNS so your US-East servers use the database replica in Virginia while your EU-West servers use the replica in Dublin? You’ve just entered extended RFC territory, which multiplies the chances you’re missing software support. Should you succeed, your reward is maintaining two separate DNS configurations that are almost, but not quite, identical.

It’s Not DNS

Eventually you have to ask yourself, why are we still doing this to ourselves? There are plenty of reasons that distributed infrastructure makes sense for the public internet, but why are we pretending that we have exactly the same problems in the privacy of our own VPCs? It would be so nice to be able to return hundreds of addresses to a single client, to quickly remove them from a cache, or to get a centralized, up-to-date view of where requests across your fleet are being routed to.

So, what if we get rid of it? DNS isn’t magic - it’s impressive and it's everywhere, but it isn’t magic. It’s a request/response format for requesting IP addresses, a key value store, and a series of caches. It's not hard to imagine doing better if you're building for a datacenter. Yes, ripping out and replacing one of computing’s foundational protocols is a big hurdle. But a new protocol doesn’t need to go everywhere yet - just into the services that want to take advantage of it. And since DNS is so limited - it’s a function from a name to an IP address - we can mostly avoid needing to rewrite application code.

We think that’s a surmountable problem and that it’s worth the effort. The payoff is even better than just no longer suffering those “it was DNS” moments - with a more modern protocol and control plane, one that is not actively fighting you, you have access to a whole new set of service-building primitives.

Junction Labs is definitely not the first group of people to think these things. Depending on who you are, the future may already be here, even though it’s not evenly distributed. But we do think that the world can be better outside of companies with massive infrastructure budgets and that a simple, focused approach to this problem might actually be good enough that people want to replace DNS on private networks. 

Because even though it always is, sometimes it probably shouldn’t be DNS.

Subscribe to the Junction Blog

New blogs, product updates, events and more

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.