@@ -39,7 +39,7 @@ Memcached is not essential but highly recommended.
39
39
### Ingester replication factor
40
40
41
41
The standard replication factor is three, so that we can drop one
42
- sample and be unconcerned, as we still have two copies of the data
42
+ replica and be unconcerned, as we still have two copies of the data
43
43
left for redundancy. This is configurable: you can run with more
44
44
redundancy or less, depending on your risk appetite.
45
45
@@ -150,12 +150,17 @@ The specific values here should be adjusted based on your own
150
150
experiences running Cortex - they are very dependent on rate of data
151
151
arriving and other factors such as series churn.
152
152
153
- ### Spread out ingesters
153
+ ### Take extra care with ingesters
154
154
155
- Don't run multiple ingesters on the same node, as that raises the risk
156
- that you will lost multiple replicas of data at the same time.
155
+ Ingesters hold hours of timeseries data in memory; you can configure
156
+ Cortex to replicate the data but you should take steps to avoid losing
157
+ all replicas at once:
158
+ - Don't run multiple ingesters on the same node.
159
+ - Don't run ingesters on preemptible/spot nodes.
160
+ - Spread out ingesters across racks / availability zones / whatever
161
+ applies in your datacenters.
157
162
158
- In Kubernetes this can be expressed as :
163
+ You can ask Kubernetes to avoid running on the same node like this :
159
164
160
165
```
161
166
affinity:
@@ -171,3 +176,34 @@ In Kubernetes this can be expressed as:
171
176
- ingester
172
177
topologyKey: "kubernetes.io/hostname"
173
178
```
179
+
180
+ Give plenty of time for an ingester to hand over or flush data to
181
+ store when shutting down; for Kubernetes this looks like:
182
+
183
+ ```
184
+ terminationGracePeriodSeconds: 2400
185
+ ```
186
+
187
+ Ask Kubernetes to limit rolling updates to one ingester at a time, and
188
+ signal the old one to stop before the new one is ready:
189
+
190
+ ```
191
+ strategy:
192
+ rollingUpdate:
193
+ maxSurge: 0
194
+ maxUnavailable: 1
195
+ ```
196
+
197
+ Ingesters provide an http hook to signal readiness when all is well;
198
+ this is valuable because it stops a rolling update at the first
199
+ problem:
200
+
201
+ ```
202
+ readinessProbe:
203
+ httpGet:
204
+ path: /ready
205
+ port: 80
206
+ ```
207
+
208
+ We do not recommend configuring a liveness probe on ingesters -
209
+ killing them is a last resort and should not be left to a machine.
0 commit comments