Ingesters latency and in-flight requests spike right after startup with empty TSDB

**Describe the bug**
When starting a brand new ingester (empty disk - running blocks storage), as soon as the ingester is registered to the ring and its state switches to `ACTIVE`, it suddenly receive a bunch of new series. If you target each ingester to have about 1.5M active series, it will have to add 1.5M series to TSDB in a matter of few seconds.

Today, while scaling out a large number of ingesters (50), in few of such ingesters we got a very high latency and a high number of in-flight requests. The high number of in-flight requests caused the memory to increase, until some of these ingesters were OOMKilled.

I've been able to profile the affected ingesters and the following is what I found so far.

#### 1. Number of in-flight push requests skyrocket right after ingester startup

![Screenshot 2020-10-14 at 17 04 02](https://user-images.githubusercontent.com/1701904/96007918-477c7780-0e3f-11eb-8525-ca2bbb70d72b.png)

#### 2. The number of TSDB appenders skyrocket too

![Screenshot 2020-10-14 at 17 02 59](https://user-images.githubusercontent.com/1701904/96007842-3469a780-0e3f-11eb-98d3-bf82f651758b.png)

#### 3. Average cortex_ingester_tsdb_appender_add_duration_seconds skyrocket too

![Screenshot 2020-10-14 at 17 06 17](https://user-images.githubusercontent.com/1701904/96008208-9d511f80-0e3f-11eb-83da-03a77f2f4f5a.png)

#### 4. Lock contention in `Head.getOrCreateWithID()`

With no big surprise, looking at the number of active goroutines, 99.9% where blocked in `Head.getOrCreateWithID()` due to lock contention.

![Screenshot 2020-10-14 at 12 55 34](https://user-images.githubusercontent.com/1701904/96008474-e43f1500-0e3f-11eb-8797-9f57b7877dae.png)

**To Reproduce**
Haven't found a way to easily reproduce it yet locally or with a stress test, but unfortunately looks that it's not that difficult to reproduce in production (where debugging is harder).

**Storage Engine**
- [x] Blocks
- [ ] Chunks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ingesters latency and in-flight requests spike right after startup with empty TSDB #3349

1. Number of in-flight push requests skyrocket right after ingester startup

2. The number of TSDB appenders skyrocket too

3. Average cortex_ingester_tsdb_appender_add_duration_seconds skyrocket too

4. Lock contention in `Head.getOrCreateWithID()`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ingesters latency and in-flight requests spike right after startup with empty TSDB #3349

Description

1. Number of in-flight push requests skyrocket right after ingester startup

2. The number of TSDB appenders skyrocket too

3. Average cortex_ingester_tsdb_appender_add_duration_seconds skyrocket too

4. Lock contention in Head.getOrCreateWithID()

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

4. Lock contention in `Head.getOrCreateWithID()`