Description
- Reported as: lp:1132725
- Reported by: lp:~tim-bunce
- Reported at: 2013-02-25T10:29:44Z
Imported from Launchpad using lp2gh.
I can't seem to find documentation that describes exactly what happens when a server goes down, and how best to handle its graceful recovery.
The documentation for the behaviours seems to assume a certain level of knowledge that I don't have, or at least I'm not sure about. It's a reference manual. I'm looking for a guide.
My question is similar to http://stackoverflow.com/questions/10029432/libmemcached-fail-over-of-a-clusters-node
except that a) I'd like more background information, and b) there's no discussion of handling the recovery of a node.
As an example, the venerable and original Cache::Memcached module (https://metacpan.org/source/DORMANDO/Cache-Memcached-1.30/lib/Cache/Memcached.pm) seems to recalculate the 'buck2sock' mapping (by $socket_cache_generation++;) whenever a socket is marked dead by _dead_sock() being called.
But some of calls to _dead_sock, specifically those from connection setup, also provide a duration ($dead_for) during which the host should be regarded as dead.
So a single communications error on an established socket doesn't remove a server, but does cause a connection drop and reconnect. If that reconnect fails then the server is removed from the buck2sock calculation for a period.
This seems reasonable and is self-repairing. Sadly the libmemcached docs don't seem to discuss this behaviour at all.
I'm rather surprised I can't find this kind of discussion in the libmemcached docs. Am I missing something?
The use-case is simply what I imagine a typical installation would be: a list of memcached servers and consistent hashing enabled.
I was asked that question recently and realised that I wasn't sure and was then doubly surprised by not being able to find it documented.
I'm looking for a simple description of exactly what happens by default in libmemcached when a memcached server goes down (both for a clean tcp close and for a network outage leading to tcp timeouts) and when it comes back again later.
That would provide the background detail against which the various optional behaviours could be explained and understood.