← back to TurboBytes blog

Introducing RUM for DNS

Published on Tue, Mar 31, 2015 by Aaron

How good is the response time and availability of your authoritative DNS, really?

To answer that question, it's important to query authoritative nameservers through the resolvers that people use at home, in the office, on mobile and at the local Starbucks. TurboBytes does just that, monitoring the real-world performance of authoritative DNS providers from across the globe, 24/7, by running tests in the browsers of millions of people that are connected to thousands of networks.
We're excited to announce our RUM for DNS !

In this article you'll read about why we built RUM for DNS, our test methodology and the benefits of RUM (Real User Monitoring) versus synthetic monitoring. But maybe you want to skip all that and take a look at some of the data? View the past 14 days performance of CloudFlare, AWS Route53, Dyn and others in our Authoritative DNS Performance Reports.

Why we developed RUM for DNS

TurboBytes runs a global Multi-CDN platform: it closely monitors CDN performance (with RUM) and makes sure traffic is always routed to the best performing CDN. Our platform constantly switches CDNs by changing low TTL CNAME records. Needless to say, our DNS needs to be awesome, with excellent performance and all the features we need.

We had been using Dyn’s DNS platform since 2012 and never had issues with performance, but we did run into must-have functional requirements that Dyn could not meet. Last year in Q2 we started looking into alternatives to Dyn and obviously performance was a key evaluation criterium. We had to be sure the performance of our new DNS provider(s) was good across the globe and we wanted to have a real-world view on authoritative DNS performance, and not benchmark performance based on a handful of tests from a handful of datacenters. We needed ‘RUM for DNS’, so we built it.

How we measure real-world authoritative DNS performance

All TurboBytes Multi-CDN customers add our non-blocking JavaScript snippet to their webpages, which executes after page load and then silently in the background runs tests to measure performance of a few CDN and DNS providers.

We want to give you some insight in what our JS code does for the DNS performance tests and the big challenge we ran into, but we’ll start with laying out our requirements.

Our requirements for RUM for DNS

  1. can measure response time ánd fail ratio (availability)
  2. timing data is accurate
  3. works with all resolvers, including resolvers that do NXDOMAIN hijacking
  4. our JS has no negative impact on the user experience
  5. works at least in Chrome and Chrome for Mobile
  6. is scalable and future-proof

We’re happy to say our solution meets all these requirements.

The challenge

The key challenge we quickly ran into was this: there is no way with JavaScript to instruct the browser to ‘do just a DNS lookup and let me know how long that took’. We first played a bit with dynamically inserting a dns-prefetch link element into the DOM but that was a dead end, simply because the browser does not expose how long the DNS lookup took. It did not take long to decide the only way forward was to use the Resource Timing API. This API exposes timing information for webpage resources. The API is implemented in IE10+, FF35+, Chrome, Opera and the default browser in Android 4.4+. We tested the behaviour of the API in all those browsers and found out that it had serious issues in IE and Firefox (DNS lookup data is missing, wrong or unreliable), and so we implemented a check in our JS to only run our RUM for DNS tests in Chrome and Opera for now.

The good thing about the using the Resource Timing API is that in Chrome and Opera the data is reliable and accurate: we always get the real DNS lookup time. But we also want to reliably detect the Authoritative DNS was too slow/unreachable, down or sent a bad response, so we can track the Fail Ratio/Availability too and not just response time. Read the next section to find out how we accomplished this.

Our solution

TurboBytes’ RUM for DNS test methodology in a nutshell:

  • fetch a very small object from a TurboBytes webserver - going through resolver only - and get the DNS Lookup Time from the Resource Timing API
  • if successful, do the same but now going through the authoritative DNS
RUM for DNS diagram

Resolver HIT test

Before we run tests that go to authoritative, we always first do a test hitting a FQDN with a 24 hrs TTL A record: the resolver hardly ever goes to authoritative because it has the response in cache. We have developed a way in JavaScript to force the browser/OS to go to the resolver, and not use the DNS response from its local cache (magic!). This test must complete within 5000 ms. If not, it’s likely our web server is not in good shape and we then don’t run any performance tests hitting authoritative. If the resolver HIT test does complete within 5000 ms, we know two things:

  1. the time it takes to get a response from resolver (nice to have)
  2. our web server is reachable and responding well
We’re good to go and run tests hitting the authoritative DNS.

Resolver MISS test

Unlike the Resolver HIT test, the Resolver MISS tests don’t have a time limit. We just let it run. Browsers and resolvers do retries and have timeout limits and we just let it run. If the authoritative can’t be reached, was very slow or sent a bad response (not a NOERROR), then at some point in time the browser will receive the SERVFAIL response from the resolver and our JS will then beacon a Fail for the authoritative. The test can’t have failed because of our web server because just a few seconds ago we ran the Resolver HIT test and from that we know our server is reachable and responding just fine. After monitoring performance of several DNS providers for a few months, spotting jumps in Fail Ratio and talking to DNS providers about this, we know for a fact that our Fail Ratio metric is solid.

Benefits of RUM versus synthetic monitoring

There are three important benefits of our RUM for DNS compared to the synthetic monitoring done by for example dnsperf.com, SolveDNS and CloudHarmony:

  • Relevance: our tests run in the browser of millions of Internet users and go through real-world resolvers to authoritative
  • Reach: our tests run on many networks and through many different resolvers
  • Test frequency: our tests run often, not just once per 5 or 15 minutes

In the spotlight: The Netherlands

To give you a feel for our relevance, reach and test frequency, here are some numbers for March 29 2015 based on beacons received from clients in The Netherlands for a single DNS provider:

Metric Value
Beacons 98134
Unique Client IPs 84981
Unique Client networks (ASNs) 146
Unique Resolver IPs 920
Unique Resolver networks (ASNs) 237
2.2% of all tests went through Google Public DNS (AS15169) and 0.5% of tests hit the authoritative via OpenDNS (AS36692).

Future

More DNS providers

VeriSign, UltraDNS, DNS Made Easy. Those are just some of DNS providers we want to add to our RUM for DNS tracking. Who would you like to see added? Let us know on Twitter!

Increase reach and test frequency

In some countries/on some networks we want to run tests more often. Over the course of the next months we’ll increase count there.

Blog posts about DNS performance

We want to regularly publish blog posts about findings from our RUM for DNS data and things related to (authoritative) DNS performance. In the next article we will probably put the spotlight on the NSONE-Route53 combo.

We always welcome your thoughts, ideas and feedback. Please share below in the comments section and don’t forget to check out our Authoritative DNS Performance Reports.

Comments