tapdance

module
v0.0.0-...-156e221 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: Apache-2.0

README

[!IMPORTANT]
Note that Tapdance is actively being developed and tested. Current functionality and performance may change without notice!

Tapdance

Tapdance is a configurable, near real-time authoritative DNS statistics collector based on dnstap.

It features reporting of aggregate DNS query statistics such as queries per second per zone, query type, response code and various DNS flags. Tapdance enriches statistics with geolocation information of resolvers and resolver latency, measured actively with ICMP pings. Statistics are pushed at configurable intervals. Tapdance runs as a stand-alone, stateless application on authoritative nameserver nodes and requires no central (external) collector.

Tapdance introduces a number of benefits over DSC:

  1. it is a self-contained, stateless application which requires no external processing (it is a single binary or stand-alone container)
  2. provides latency information between resolvers and nameserver nodes
  3. much shorter query-to-visualization delay
  4. No XML files being saved to disk waiting to be collected
  5. Based on dnstap, a logging standard which is already implemented in all major nameserver software

Current limitations:

  1. Currently only supports reporting statistics to InfluxDB (a free-tier Cloud Serverless account suffices)
  2. Optional geolocation enrichment currently only supports MaxMind

DNStap and Tapdance are explicitly logging tools, which will drop query logs when congested, and therefore have minimal impact on nameserver software performance.


Requirements

The current requirements for running Tapdance are:

  • Authoritative nameserver software capable of sending dnstap logs to a UNIX socket (or run dnstap as stand alone)
  • An InfluxDB instance (may be a free-tier Cloud Serverless account)
  • Maxmind credentials (may be a free account) if you wish to enrich resolver data with geolocation information. Note that free accounts are limited to 30 downloads per day. Tapdance downloads the database upon start.

Usage

Docker image

A minimal Docker image may be built with the provided Dockerfile (make docker). Note the following requirements for running Tapdance in a docker container:

  • Tapdance should be ran with host networking: --network host
  • Two volumes mounts are required: one to inject the configuration file, and one to expose the UNIX socket created by Tapdance to which dnstap logs must be written. For example: -v /path/to/config.yaml:/config/config.yaml, -v /shared/dnstap/location:/dnstap
  • Tapdance requires the NET_RAW capability to send ICMP pings: --add-cap NET_RAW

The configuration file is expected at /config/config.yaml, but may be passed as the first argument to the application.

Example:

docker run \
--network host --add-cap NET_RAW \
-v ./config.yaml:/etc/config.yaml \
-v ./dnstap:/dnstap \
-e INFLUX_TOKEN -e MAXMIND_ACCOUNT_ID -e MAXMIND_LICENSE_KEY \
tapdance /etc/config.yaml
Binary

Tapdance can be compiled to a single binary with make build (or go build -o tapdance ./cmd/tapdance). The binary requires CAP_NET_RAW if it's not run as root. Run the binary with a configuration file: ./tapdance /path/to/config.yaml


Configuration

Tapdance

Tapdance is configured with a single config.yaml file and a few secrets, expected as environment variables. This configuration may be different across different running instances of Tapdance.

In the configuration file, we define global application configuration variables, as well as a set of configuration variables for each DNS zone you wish to capture with Tapdance. The configuration file is documented in line and should be self-explanatory.

Required environment variables are:

  • INFLUX_TOKEN – Token with write access to the InfluxDB host and bucket specified in config.yaml

When using Maxmind for geolocation lookups:

  • MAXMIND_ACCOUNT_ID – Maxmind account ID
  • MAXMIND_LICENSE_KEY – Maxmind license key

Optional:

  • NSID – Nameserver name or identifier. May also be passed in config.yaml

The Maxmind credentials must have access to the maxmindDownloadURL specified in config.yaml. GeoLite2-Country may be used if GeoIP2-Country is unavailable to you, with less accurate geolocation.

Nameserver software

Nameserver software must be instructed to write dnstap logs (queries and responses) to the UNIX domain socket(s) specified in config.yaml. Follow the documentation for Knot, NSD, BIND 9, Unbound, PowerDNS, CoreDNS. Make sure dnstap logs both queries and responses. Note that the UNIX socket will be created by the Tapdance application, and file (and directory) permissions are required to write to it from the nameserver software. In the containerized application Tapdance runs as UID 53.

Each UNIX socket endpoint to which dnstap logs are written must be specified in the Tapdance config yaml with a name for that zone (or group of zones/domains). To gather query statistics for multiple zones (or domains) separately, define multiple instances of the dnstap module in the nameserver software which write to separate UNIX sockets. Find a minimal Knot config example here.


Visualizations with Grafana

Dashboard templates and example visualizations are provided here.


Performance

Load testing and performance in practice

During normal operation at 10K queries per second, Tapdance uses 0.2 CPU cores and 200MB memory.

An initial rough stress test using Knot DNS showed Tapdance can handle 165K queries per second. Knot itself did not experience any trouble as Tapdance became congested beyond this point and answered all queries. Since we rarely experience these query counts in practice, we find this acceptable. Still, we will further investigate the exact bottleneck of Tapdance and continue to improve its performance.

Benchmarking individual components

We tested the maximum supported load for three parts of the application. These benchmark tests are available in parser_test.go. For these testing purposes we saved a sample dnstap log of 50k representative DNS queries. To reproduce the benchmarks, capture a representative dnstap sample file with nameserver software and run the tests with go test -v -bench=. -sample=/path/to/dnstap/sample while in the internal/parsing directory. Check the expected Go version in go.mod.

Benchmark output below is generated on an M3 Macbook pro.

Reading byte stream to frames

The dnstap module reads a byte stream of framestream frames containing protobuf-encoded messages from a UNIX socket. In the benchmark we read from a file instead. We benchmark how fast framestream frames can be read from the input bytes and added to a go channel.

BenchmarkFramestreamRead
    parser_test.go:39: 98354 frames read in 7.21521ms on average (13631481 frames/second ~= 6815740 queries/second with responses)
BenchmarkFramestreamRead-8           162           7215211 ns/op

This is clearly not a bottleneck, being able to handle 6.8M queries/second.

Unmarshalling protobuf

We take the input channel of byte slices (framestream frames), and unmarshal into dnstap messages following the protobuf spec. We add the dnstap messages to a go channel.

BenchmarkUnmarshalDnstap
    parser_test.go:111: 98354 frames unmarshalled in 34.364713ms on average (2862063 frames/second ~= 1431031 queries/second with responses)
BenchmarkUnmarshalDnstap-8            30          34364714 ns/op

We can unmarshal protobuf messages to around 1.4M queries/second with responses.

Processing messages

We benchmark the processing of parsed dnstap messages from the message queue. This encompasses updating of statistics in the various structs.

BenchmarkProcessMessages
    parser_test.go:195: 98354 dnstap messages processed in 37.620529ms on average (2614370 msg/second ~= 1307185 queries/second with responses)
BenchmarkProcessMessages-8            31          37620530 ns/op

Processing dnstap messages by Tapdance reaches 1.3M queries per second, with responses. Note that this benchmark runs with minimal contention of locks, which will increase when running Tapdance for multiple zones. Processing messages may also buffer briefly when switching maps in a reporting cycle, or during other processes which iterate over maps, which is also not captured in this benchmark.

Iteration over a map prevents writers from accessing the map. Various processes iterate over resolver statistics and information maps in Tapdance, e.g. statistics gathering for reporting, updating geolocations, and purging stale resolvers. This blocking is relieved by temporarily writing to a buffer, which will synchronize with the main map after the iteration is complete. The short period of synchronization between the buffer and the main map might take a few milliseconds at most. The dnstap message buffer allows for 5ms of blocking at 1M queries per second.

Conclusion

Under maximum load, the processing of parsed dnstap messages is the bottleneck. When the message buffer is completely congested, the UNIX socket will congest, which will in turn fill the ring buffer of the dnstap producer, resulting in front-loss of log messages.


Application schematic

Below is a high-level schematic overview of the application.

Tapdance application schematic


Contributing

Contributions are welcome! Contributions on the following topics are particularly appreciated:

  • Load / performance testing (end-to-end)
  • Performance optimizations
  • Additional reporting destinations besides InfluxDB (or abstracted away with OpenTelemetry)
  • Additional Geolocation options besides MaxMind Country

Directories

Path Synopsis
cmd
tapdance command
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL