Miek Gieben 35b40a84f2
plugin/cache: Fix filtering (#4148)
The filtering of DNSSEC records in the cache plugin was not done
correctly. Also the change to introduced this bug didn't take into
account that the cache - by virtue of differentiating between DNSSEC and
no-DNSSEC - relied on not copying the data from the cache.

This change copies and then filters the data and factors the filtering
into a function that is used in two places (albeit with on ugly boolean
parameters to prevent copying things twice).

Add tests, do_test.go is moved to test/cache_test.go because the OPT
handing is done outside of the cache plugin. The core server re-attaches
the correct OPT when replying, so that makes for a better e2e test.

Added small unit test for filterRRslice and an explicit test that asks
for DNSSEC first and then plain, and vice versa to test cache behavior.

Fixes: #4146

Signed-off-by: Miek Gieben <miek@miek.nl>
2020-09-28 07:53:00 -07:00
..
2017-09-14 09:36:06 +01:00
2020-09-28 07:53:00 -07:00
2020-09-28 07:53:00 -07:00
2019-08-25 19:01:35 +00:00
2020-09-17 07:28:43 -07:00
2020-09-28 07:53:00 -07:00
2018-07-19 16:23:06 +01:00
2020-09-17 07:28:43 -07:00
2020-09-24 18:14:41 +02:00
2020-09-24 18:14:41 +02:00

cache

Name

cache - enables a frontend cache.

Description

With cache enabled, all records except zone transfers and metadata records will be cached for up to 3600s. Caching is mostly useful in a scenario when fetching data from the backend (upstream, database, etc.) is expensive.

Cache will change the query to enable DNSSEC (DNSSEC OK; DO) if it passes through the plugin. If the client didn't request any DNSSEC (records), these are filtered out when replying.

This plugin can only be used once per Server Block.

Syntax

cache [TTL] [ZONES...]
  • TTL max TTL in seconds. If not specified, the maximum TTL will be used, which is 3600 for NOERROR responses and 1800 for denial of existence ones. Setting a TTL of 300: cache 300 would cache records up to 300 seconds.
  • ZONES zones it should cache for. If empty, the zones from the configuration block are used.

Each element in the cache is cached according to its TTL (with TTL as the max). A cache is divided into 256 shards, each holding up to 39 items by default - for a total size of 256 * 39 = 9984 items.

If you want more control:

cache [TTL] [ZONES...] {
    success CAPACITY [TTL] [MINTTL]
    denial CAPACITY [TTL] [MINTTL]
    prefetch AMOUNT [[DURATION] [PERCENTAGE%]]
    serve_stale [DURATION]
}
  • TTL and ZONES as above.
  • success, override the settings for caching successful responses. CAPACITY indicates the maximum number of packets we cache before we start evicting (randomly). TTL overrides the cache maximum TTL. MINTTL overrides the cache minimum TTL (default 5), which can be useful to limit queries to the backend.
  • denial, override the settings for caching denial of existence responses. CAPACITY indicates the maximum number of packets we cache before we start evicting (LRU). TTL overrides the cache maximum TTL. MINTTL overrides the cache minimum TTL (default 5), which can be useful to limit queries to the backend. There is a third category (error) but those responses are never cached.
  • prefetch will prefetch popular items when they are about to be expunged from the cache. Popular means AMOUNT queries have been seen with no gaps of DURATION or more between them. DURATION defaults to 1m. Prefetching will happen when the TTL drops below PERCENTAGE, which defaults to 10%, or latest 1 second before TTL expiration. Values should be in the range [10%, 90%]. Note the percent sign is mandatory. PERCENTAGE is treated as an int.
  • serve_stale, when serve_stale is set, cache always will serve an expired entry to a client if there is one available. When this happens, cache will attempt to refresh the cache entry after sending the expired cache entry to the client. The responses have a TTL of 0. DURATION is how far back to consider stale responses as fresh. The default duration is 1h.

Capacity and Eviction

If CAPACITY is not specified, the default cache size is 9984 per cache. The minimum allowed cache size is 1024. If CAPACITY is specified, the actual cache size used will be rounded down to the nearest number divisible by 256 (so all shards are equal in size).

Eviction is done per shard. In effect, when a shard reaches capacity, items are evicted from that shard. Since shards don't fill up perfectly evenly, evictions will occur before the entire cache reaches full capacity. Each shard capacity is equal to the total cache size / number of shards (256). Eviction is random, not TTL based. Entries with 0 TTL will remain in the cache until randomly evicted when the shard reaches capacity.

Metrics

If monitoring is enabled (via the prometheus plugin) then the following metrics are exported:

  • coredns_cache_entries{server, type} - Total elements in the cache by cache type.
  • coredns_cache_hits_total{server, type} - Counter of cache hits by cache type.
  • coredns_cache_misses_total{server} - Counter of cache misses.
  • coredns_cache_prefetch_total{server} - Counter of times the cache has prefetched a cached item.
  • coredns_cache_drops_total{server} - Counter of responses excluded from the cache due to request/response question name mismatch.
  • coredns_cache_served_stale_total{server} - Counter of requests served from stale cache entries.

Cache types are either "denial" or "success". Server is the server handling the request, see the prometheus plugin for documentation.

Examples

Enable caching for all zones, but cap everything to a TTL of 10 seconds:

. {
    cache 10
    whoami
}

Proxy to Google Public DNS and only cache responses for example.org (or below).

. {
    forward . 8.8.8.8:53
    cache example.org
}

Enable caching for example.org, keep a positive cache size of 5000 and a negative cache size of 2500:

example.org {
    cache {
        success 5000
        denial 2500
    }
}