Skip to content

CachingResolver: stop retrying permanently failing hosts #3950

@guizmaii

Description

@guizmaii

Problem

When expireAction == Refresh, a host that fails DNS resolution is retried every unknownHostTtl (default 1 minute) indefinitely. Each retry consumes a semaphore permit and makes a blocking system DNS call. For hosts that will never resolve, this wastes resources forever.

Proposal

Add a max retry count to CachingResolver. After a configurable number of consecutive failed refreshes, the entry would be dropped from the cache instead of being retried.

This would require:

  • A new failureCount field on CacheEntry, incremented in refreshOrDropEntries each time a failed entry is refreshed
  • A new maxRetries config field in DnsResolver.Config (with a sensible default)
  • When failureCount >= maxRetries, drop the entry regardless of expireAction

Alternatives

  • Max failure TTL: Track when the host first started failing and drop after a duration ceiling. Similar complexity (new field on CacheEntry) but uses time rather than count.
  • Exponential backoff: Progressively increase the retry interval. Reduces resource usage without a hard cutoff, but adds complexity.

@987Nabil Which solution do you prefer?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions