Skip to content

Commit 39c9fcf

Browse files
committed
Extend ingesters section
Signed-off-by: Bryan Boreham <[email protected]>
1 parent 3265120 commit 39c9fcf

File tree

1 file changed

+41
-5
lines changed

1 file changed

+41
-5
lines changed

docs/running.md

Lines changed: 41 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Memcached is not essential but highly recommended.
3939
### Ingester replication factor
4040

4141
The standard replication factor is three, so that we can drop one
42-
sample and be unconcerned, as we still have two copies of the data
42+
replica and be unconcerned, as we still have two copies of the data
4343
left for redundancy. This is configurable: you can run with more
4444
redundancy or less, depending on your risk appetite.
4545

@@ -150,12 +150,17 @@ The specific values here should be adjusted based on your own
150150
experiences running Cortex - they are very dependent on rate of data
151151
arriving and other factors such as series churn.
152152

153-
### Spread out ingesters
153+
### Take extra care with ingesters
154154

155-
Don't run multiple ingesters on the same node, as that raises the risk
156-
that you will lost multiple replicas of data at the same time.
155+
Ingesters hold hours of timeseries data in memory; you can configure
156+
Cortex to replicate the data but you should take steps to avoid losing
157+
all replicas at once:
158+
- Don't run multiple ingesters on the same node.
159+
- Don't run ingesters on preemptible/spot nodes.
160+
- Spread out ingesters across racks / availability zones / whatever
161+
applies in your datacenters.
157162

158-
In Kubernetes this can be expressed as:
163+
You can ask Kubernetes to avoid running on the same node like this:
159164

160165
```
161166
affinity:
@@ -171,3 +176,34 @@ In Kubernetes this can be expressed as:
171176
- ingester
172177
topologyKey: "kubernetes.io/hostname"
173178
```
179+
180+
Give plenty of time for an ingester to hand over or flush data to
181+
store when shutting down; for Kubernetes this looks like:
182+
183+
```
184+
terminationGracePeriodSeconds: 2400
185+
```
186+
187+
Ask Kubernetes to limit rolling updates to one ingester at a time, and
188+
signal the old one to stop before the new one is ready:
189+
190+
```
191+
strategy:
192+
rollingUpdate:
193+
maxSurge: 0
194+
maxUnavailable: 1
195+
```
196+
197+
Ingesters provide an http hook to signal readiness when all is well;
198+
this is valuable because it stops a rolling update at the first
199+
problem:
200+
201+
```
202+
readinessProbe:
203+
httpGet:
204+
path: /ready
205+
port: 80
206+
```
207+
208+
We do not recommend configuring a liveness probe on ingesters -
209+
killing them is a last resort and should not be left to a machine.

0 commit comments

Comments
 (0)