Skip to content

Commit d7e1c6d

Browse files
authored
Fix store-gateway sharding ring CLI flags prefix (#3201) (#3202)
* Fix store-gateway sharding ring CLI flags prefix Signed-off-by: Marco Pracucci <[email protected]> * Updated doc Signed-off-by: Marco Pracucci <[email protected]> * Fixed integration tests Signed-off-by: Marco Pracucci <[email protected]>
1 parent bb5fcc9 commit d7e1c6d

File tree

10 files changed

+47
-51
lines changed

10 files changed

+47
-51
lines changed

CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,13 @@
55
## 1.4.0-rc.0 / 2020-09-15
66

77
* [CHANGE] Cassandra backend support is now GA (stable). #3180
8-
* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180
8+
* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180 #3201
99
- `-experimental.blocks-storage.*` flags renamed to `-blocks-storage.*`
1010
- `-experimental.store-gateway.*` flags renamed to `-store-gateway.*`
1111
- `-experimental.querier.store-gateway-client.*` flags renamed to `-querier.store-gateway-client.*`
1212
- `-experimental.querier.store-gateway-addresses` flag renamed to `-querier.store-gateway-addresses`
13+
- `-store-gateway.replication-factor` flag renamed to `-store-gateway.sharding-ring.replication-factor`
14+
- `-store-gateway.tokens-file-path` flag renamed to `store-gateway.sharding-ring.tokens-file-path`
1315
* [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running `v1.0` or below, it is recommended to first upgrade to `v1.1`/`v1.2`/`v1.3` and run it for a day before upgrading to `v1.4` to avoid data loss. #3115
1416
* [CHANGE] Distributor API endpoints are no longer served unless target is set to `distributor` or `all`. #3112
1517
* [CHANGE] Increase the default Cassandra client replication factor to 3. #3007

docs/blocks-storage/querier.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet
3030

3131
Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range.
3232

33-
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
33+
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
3434

3535
If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet.
3636

docs/blocks-storage/querier.template

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet
3030

3131
Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range.
3232

33-
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
33+
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
3434

3535
If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet.
3636

docs/blocks-storage/store-gateway.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Store-gateways continuously monitor the ring state and whenever the ring topolog
3131

3232
For each block belonging to a store-gateway shard, the store-gateway loads its `meta.json`, the `deletion-mark.json` and the index-header. Once a block is loaded on the store-gateway, it's ready to be queried by queriers. When the querier queries blocks through a store-gateway, the response will contain the list of actually queried block IDs. If a querier tries to query a block which has not been loaded by a store-gateway, the querier will either retry on a different store-gateway (if blocks replication is enabled) or fail the query.
3333

34-
Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update).
34+
Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.sharding-ring.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update).
3535

3636
This feature can be enabled via `-store-gateway.sharding-enabled=true` and requires the backend [hash ring](../architecture.md#the-hash-ring) to be configured via `-store-gateway.sharding-ring.*` flags (or their respective YAML config options).
3737

@@ -199,12 +199,12 @@ store_gateway:
199199
# The replication factor to use when sharding blocks. This option needs be
200200
# set both on the store-gateway and querier when running in microservices
201201
# mode.
202-
# CLI flag: -store-gateway.replication-factor
202+
# CLI flag: -store-gateway.sharding-ring.replication-factor
203203
[replication_factor: <int> | default = 3]
204204

205205
# File path where tokens are stored. If empty, tokens are not stored at
206206
# shutdown and restored at startup.
207-
# CLI flag: -store-gateway.tokens-file-path
207+
# CLI flag: -store-gateway.sharding-ring.tokens-file-path
208208
[tokens_file_path: <string> | default = ""]
209209

210210
# The sharding strategy to use. Supported values are: default,

docs/blocks-storage/store-gateway.template

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Store-gateways continuously monitor the ring state and whenever the ring topolog
3131

3232
For each block belonging to a store-gateway shard, the store-gateway loads its `meta.json`, the `deletion-mark.json` and the index-header. Once a block is loaded on the store-gateway, it's ready to be queried by queriers. When the querier queries blocks through a store-gateway, the response will contain the list of actually queried block IDs. If a querier tries to query a block which has not been loaded by a store-gateway, the querier will either retry on a different store-gateway (if blocks replication is enabled) or fail the query.
3333

34-
Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update).
34+
Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.sharding-ring.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update).
3535

3636
This feature can be enabled via `-store-gateway.sharding-enabled=true` and requires the backend [hash ring](../architecture.md#the-hash-ring) to be configured via `-store-gateway.sharding-ring.*` flags (or their respective YAML config options).
3737

docs/configuration/config-file-reference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3622,12 +3622,12 @@ sharding_ring:
36223622
36233623
# The replication factor to use when sharding blocks. This option needs be set
36243624
# both on the store-gateway and querier when running in microservices mode.
3625-
# CLI flag: -store-gateway.replication-factor
3625+
# CLI flag: -store-gateway.sharding-ring.replication-factor
36263626
[replication_factor: <int> | default = 3]
36273627
36283628
# File path where tokens are stored. If empty, tokens are not stored at
36293629
# shutdown and restored at startup.
3630-
# CLI flag: -store-gateway.tokens-file-path
3630+
# CLI flag: -store-gateway.sharding-ring.tokens-file-path
36313631
[tokens_file_path: <string> | default = ""]
36323632
36333633
# The sharding strategy to use. Supported values are: default, shuffle-sharding.

integration/backward_compatibility_test.go

Lines changed: 13 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -27,15 +27,7 @@ var (
2727
// 0.7.0 used 204 status code for all components
2828
"quay.io/cortexproject/cortex:v0.7.0": preCortex10Flags,
2929

30-
"quay.io/cortexproject/cortex:v1.0.0": func(flags map[string]string) map[string]string {
31-
return e2e.MergeFlagsWithoutRemovingEmpty(flags, map[string]string{
32-
"-store-gateway.sharding-enabled": "",
33-
"-store-gateway.sharding-ring.store": "",
34-
"-store-gateway.sharding-ring.consul.hostname": "",
35-
"-store-gateway.replication-factor": "",
36-
})
37-
},
38-
30+
"quay.io/cortexproject/cortex:v1.0.0": preCortex14Flags,
3931
"quay.io/cortexproject/cortex:v1.1.0": preCortex14Flags,
4032
"quay.io/cortexproject/cortex:v1.2.0": preCortex14Flags,
4133
"quay.io/cortexproject/cortex:v1.3.0": preCortex14Flags,
@@ -44,24 +36,24 @@ var (
4436

4537
func preCortex10Flags(flags map[string]string) map[string]string {
4638
return e2e.MergeFlagsWithoutRemovingEmpty(flags, map[string]string{
47-
"-schema-config-file": "",
48-
"-config-yaml": flags["-schema-config-file"],
49-
"-table-manager.poll-interval": "",
50-
"-dynamodb.poll-interval": flags["-table-manager.poll-interval"],
51-
"-store-gateway.sharding-enabled": "",
52-
"-store-gateway.sharding-ring.store": "",
53-
"-store-gateway.sharding-ring.consul.hostname": "",
54-
"-store-gateway.replication-factor": "",
39+
"-schema-config-file": "",
40+
"-config-yaml": flags["-schema-config-file"],
41+
"-table-manager.poll-interval": "",
42+
"-dynamodb.poll-interval": flags["-table-manager.poll-interval"],
43+
"-store-gateway.sharding-enabled": "",
44+
"-store-gateway.sharding-ring.store": "",
45+
"-store-gateway.sharding-ring.consul.hostname": "",
46+
"-store-gateway.sharding-ring.replication-factor": "",
5547
})
5648
}
5749

5850
func preCortex14Flags(flags map[string]string) map[string]string {
5951
return e2e.MergeFlagsWithoutRemovingEmpty(flags, map[string]string{
6052
// Blocks storage CLI flags removed the "experimental" prefix in 1.4.
61-
"-store-gateway.sharding-enabled": "",
62-
"-store-gateway.sharding-ring.store": "",
63-
"-store-gateway.sharding-ring.consul.hostname": "",
64-
"-store-gateway.replication-factor": "",
53+
"-store-gateway.sharding-enabled": "",
54+
"-store-gateway.sharding-ring.store": "",
55+
"-store-gateway.sharding-ring.consul.hostname": "",
56+
"-store-gateway.sharding-ring.replication-factor": "",
6557
})
6658
}
6759

integration/e2ecortex/services.go

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -83,10 +83,10 @@ func NewQuerierWithConfigFile(name, consulAddress, configFile string, flags map[
8383
"-querier.frontend-client.backoff-retries": "1",
8484
"-querier.worker-parallelism": "1",
8585
// Store-gateway ring backend.
86-
"-store-gateway.sharding-enabled": "true",
87-
"-store-gateway.sharding-ring.store": "consul",
88-
"-store-gateway.sharding-ring.consul.hostname": consulAddress,
89-
"-store-gateway.replication-factor": "1",
86+
"-store-gateway.sharding-enabled": "true",
87+
"-store-gateway.sharding-ring.store": "consul",
88+
"-store-gateway.sharding-ring.consul.hostname": consulAddress,
89+
"-store-gateway.sharding-ring.replication-factor": "1",
9090
}, flags))...),
9191
e2e.NewHTTPReadinessProbe(httpPort, "/ready", 200, 299),
9292
httpPort,
@@ -114,10 +114,10 @@ func NewStoreGatewayWithConfigFile(name, consulAddress, configFile string, flags
114114
"-target": "store-gateway",
115115
"-log.level": "warn",
116116
// Store-gateway ring backend.
117-
"-store-gateway.sharding-enabled": "true",
118-
"-store-gateway.sharding-ring.store": "consul",
119-
"-store-gateway.sharding-ring.consul.hostname": consulAddress,
120-
"-store-gateway.replication-factor": "1",
117+
"-store-gateway.sharding-enabled": "true",
118+
"-store-gateway.sharding-ring.store": "consul",
119+
"-store-gateway.sharding-ring.consul.hostname": consulAddress,
120+
"-store-gateway.sharding-ring.replication-factor": "1",
121121
}, flags))...),
122122
e2e.NewHTTPReadinessProbe(httpPort, "/ready", 200, 299),
123123
httpPort,

integration/querier_test.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -295,10 +295,10 @@ func TestQuerierWithBlocksStorageRunningInSingleBinaryMode(t *testing.T) {
295295
// Distributor.
296296
"-distributor.replication-factor": strconv.FormatInt(seriesReplicationFactor, 10),
297297
// Store-gateway.
298-
"-store-gateway.sharding-enabled": strconv.FormatBool(testCfg.blocksShardingEnabled),
299-
"-store-gateway.sharding-ring.store": "consul",
300-
"-store-gateway.sharding-ring.consul.hostname": consul.NetworkHTTPEndpoint(),
301-
"-store-gateway.replication-factor": "1",
298+
"-store-gateway.sharding-enabled": strconv.FormatBool(testCfg.blocksShardingEnabled),
299+
"-store-gateway.sharding-ring.store": "consul",
300+
"-store-gateway.sharding-ring.consul.hostname": consul.NetworkHTTPEndpoint(),
301+
"-store-gateway.sharding-ring.replication-factor": "1",
302302
})
303303

304304
// Start Cortex replicas.

pkg/storegateway/gateway_ring.go

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -61,19 +61,21 @@ func (cfg *RingConfig) RegisterFlags(f *flag.FlagSet) {
6161
os.Exit(1)
6262
}
6363

64+
ringFlagsPrefix := "store-gateway.sharding-ring."
65+
6466
// Ring flags
65-
cfg.KVStore.RegisterFlagsWithPrefix("store-gateway.sharding-ring.", "collectors/", f)
66-
f.DurationVar(&cfg.HeartbeatPeriod, "store-gateway.sharding-ring.heartbeat-period", 15*time.Second, "Period at which to heartbeat to the ring.")
67-
f.DurationVar(&cfg.HeartbeatTimeout, "store-gateway.sharding-ring.heartbeat-timeout", time.Minute, "The heartbeat timeout after which store gateways are considered unhealthy within the ring."+sharedOptionWithQuerier)
68-
f.IntVar(&cfg.ReplicationFactor, "store-gateway.replication-factor", 3, "The replication factor to use when sharding blocks."+sharedOptionWithQuerier)
69-
f.StringVar(&cfg.TokensFilePath, "store-gateway.tokens-file-path", "", "File path where tokens are stored. If empty, tokens are not stored at shutdown and restored at startup.")
67+
cfg.KVStore.RegisterFlagsWithPrefix(ringFlagsPrefix, "collectors/", f)
68+
f.DurationVar(&cfg.HeartbeatPeriod, ringFlagsPrefix+"heartbeat-period", 15*time.Second, "Period at which to heartbeat to the ring.")
69+
f.DurationVar(&cfg.HeartbeatTimeout, ringFlagsPrefix+"heartbeat-timeout", time.Minute, "The heartbeat timeout after which store gateways are considered unhealthy within the ring."+sharedOptionWithQuerier)
70+
f.IntVar(&cfg.ReplicationFactor, ringFlagsPrefix+"replication-factor", 3, "The replication factor to use when sharding blocks."+sharedOptionWithQuerier)
71+
f.StringVar(&cfg.TokensFilePath, ringFlagsPrefix+"tokens-file-path", "", "File path where tokens are stored. If empty, tokens are not stored at shutdown and restored at startup.")
7072

7173
// Instance flags
7274
cfg.InstanceInterfaceNames = []string{"eth0", "en0"}
73-
f.Var((*flagext.StringSlice)(&cfg.InstanceInterfaceNames), "store-gateway.sharding-ring.instance-interface", "Name of network interface to read address from.")
74-
f.StringVar(&cfg.InstanceAddr, "store-gateway.sharding-ring.instance-addr", "", "IP address to advertise in the ring.")
75-
f.IntVar(&cfg.InstancePort, "store-gateway.sharding-ring.instance-port", 0, "Port to advertise in the ring (defaults to server.grpc-listen-port).")
76-
f.StringVar(&cfg.InstanceID, "store-gateway.sharding-ring.instance-id", hostname, "Instance ID to register in the ring.")
75+
f.Var((*flagext.StringSlice)(&cfg.InstanceInterfaceNames), ringFlagsPrefix+"instance-interface", "Name of network interface to read address from.")
76+
f.StringVar(&cfg.InstanceAddr, ringFlagsPrefix+"instance-addr", "", "IP address to advertise in the ring.")
77+
f.IntVar(&cfg.InstancePort, ringFlagsPrefix+"instance-port", 0, "Port to advertise in the ring (defaults to server.grpc-listen-port).")
78+
f.StringVar(&cfg.InstanceID, ringFlagsPrefix+"instance-id", hostname, "Instance ID to register in the ring.")
7779

7880
// Defaults for internal settings.
7981
cfg.RingCheckPeriod = 5 * time.Second

0 commit comments

Comments
 (0)