Releases: Altinity/clickhouse-operator
release-0.23.0
Added
- Kubernetes secrets are currently supported with the standard syntax for user passwords, configuration settings, and configuration files, for example:
users:
user1/password:
valueFrom:
secretKeyRef:
name: clickhouse_secret
key: pwduser1
settings:
s3/my_bucket/access_key:
valueFrom:
secretKeyRef:
name: s3-credentials
key: AWS_ACCESS_KEY_ID
files:
server.key:
valueFrom:
secretKeyRef:
name: clickhouse-certs
key: server.key
See updated Security Hardening Guide for more detail.
kind: ClickHouseKeeperInstallation
See examples in there: https://github.com/Altinity/clickhouse-operator/tree/0.23.0/docs/chk-examples
The implementation is not final, following things yet needs to be done:
- dynamic reconfiguration, that is required in order to support adding and removing Keeper replicas
- integration with ClickHouseInstallation, so Keeper could be referenced by a reference, instead by a service
- CHI labels are now added to exported Prometheus metrics
Changed
- Services are now re-created if ServiceType is changed in order to workaround Kubernetes issue. Closes #1302
- Operator now waits for ClickHouse service endpoints to respond when checking node is up.
- CHI templates are now automatically reloaded by operator. Before, templates were only reloaded during startup. In order to apply changes, CHI update needs to be triggered.
- Operator will now crash if operator configuration is broken or can not be parsed. That prevents the fallback to the defaults in case of errors.
Fixed
- Fixed schema propagation on new replicas for ClickHouse 23.11 and above
- Fixed data recovery when PVC is deleted by a user. Closes #1310
Improved
- Improve helm, update values.yaml to properly generate helm/README.md by @Slach in #1278
- Improve clickhouse-keeper manifests by @Slach in #1234
- chore: remove refs to deprecated io/ioutil by @testwill in #1273
- Update URL for accepted logging levels by @madrisan in #1270
- Add a chi example for sync users by @ccsxs in #1304
- Bump zookepper operator version to 0.2.15 by @GrahamCampbell in #1303
- Optional values.rbac to deploy rbac resources by @Salec in #1316
- update helm chart generator to treat config.yaml as yaml in values by @echozio in #1317
Full Changelog: release-0.22.2...release-0.23.0
release-0.22.2
What's Changed
- Fixed a bug when operator did not restart ClickHouse pods if 'files' section was changed without 'config.d' destination, e.g.
files/settings.xml
. - Fix ServiceMonitor endpoints #1276 by @MiguelNdeCarvalho, and #1290 by @muicoder. Closes #1287
- Disabled prefer_localhost_replica in default profile
Full Changelog: release-0.22.1...release-0.22.2
release-0.22.1
Added
- New 'Aborted' status for CHI is set when reconcile is aborted by an operator
Changed
- Allow shard weight to be zero (#1192 by maxistua)
- Removed excessive logging for pod update events
- Removed 30s delay after creating a service
- Allow empty values for CRD status and some other fields in order to facilitate migration from old operator versions that were upgraded without upgrading CRD first. Fixes #842, #890 and similar issues.
Full Changelog: release-0.22.0...release-0.22.1
release-0.22.0
Added
- Support volume re-provisioning. If volume is broken and PVC detects it as lost, operator re-provisions the volume
- When new CHI is created, all hosts are created in parallel
- Allow to turn off waiting for running queries to complete. This can be done both in operator configuration or in CHI itself:
In operator configuration:
spec:
reconcile:
host:
wait:
queries: "false"
In CHI:
spec:
reconciling:
policy: nowait
- When changes are applied to clusters with a lot of shards, the change is probed on a first node only. Is successul, it is applied on 50% of shards. This can be configured in operator configuration:
reconcile:
# Reconcile runtime settings
runtime:
# Max number of concurrent CHI reconciles in progress
reconcileCHIsThreadsNumber: 10
# The operator reconciles shards concurrently in each CHI with the following limitations:
# 1. Number of shards being reconciled (and thus having hosts down) in each CHI concurrently
# can not be greater than 'reconcileShardsThreadsNumber'.
# 2. Percentage of shards being reconciled (and thus having hosts down) in each CHI concurrently
# can not be greater than 'reconcileShardsMaxConcurrencyPercent'.
# 3. The first shard is always reconciled alone. Concurrency starts from the second shard and onward.
# Thus limiting number of shards being reconciled (and thus having hosts down) in each CHI by both number and percentage
# Max number of concurrent shard reconciles within one CHI in progress
reconcileShardsThreadsNumber: 5
# Max percentage of concurrent shard reconciles within one CHI in progress
reconcileShardsMaxConcurrencyPercent: 50
- Operator-related metrics are exposed to Prometheus now:
clickhouse_operator_chi_reconciles_started
clickhouse_operator_chi_reconciles_completed
clickhouse_operator_chi_reconciles_timings
clickhouse_operator_host_reconciles_started
clickhouse_operator_host_reconciles_completed
clickhouse_operator_host_reconciles_restarts
clickhouse_operator_host_reconciles_errors
clickhouse_operator_host_reconciles_timings
clickhouse_operator_pod_add_events
clickhouse_operator_pod_update_events
clickhouse_operator_pod_delete_events
Changed
- fix typo in operator_installation_details.md by @seeekr in #1219
- Set operator release date fot createdAt CSV field by @dmvolod in #1223
- Fix type for exclude and include fields in 70-chop-config.yaml example by @dmvolod in #1222
- change dashboard refresh rate 1m and add min_duration_ms, max_duration_ms dashboard variables, rename query_type to query_kind by @Slach in #1235
- add securityContext to helm chart by @farodin91 in #1255
- metrics-exporter collects all hosts and queries in parallel
Fixed
- Fixed a bug when operator could break multiple nodes if incorrect configuration has been deployed several times in a row
- Fixed a bug when schema could not be created on new nodes, if nodes took too long to start
- Fixed a bug when services were not reconciled in rare cases
New Contributors
- @seeekr made their first contribution in #1219
- @dmvolod made their first contribution in #1223
- @farodin91 made their first contribution in #1255
Full Changelog: release-0.21.3...release-0.22.0
release-0.21.3
Added
- Added selectors to CHITemplates. Now it is possible to define a template that is applied to certain CHI. Example here: https://github.com/Altinity/clickhouse-operator/blob/0.21.3/docs/chi-examples/50-CHIT-04-auto-template-volume-with-selector.yaml
- Added '.status.useTemplates' to reflect all templates used in CHI manually or auto
Changed
- CHITemplates are now re-loaded automatically without a need to restart operator. Changes in CHITemplates are not applied automatically to affected CHI.
Fixed
- Fix nil pointer deref in metrics exporter (#1187) by @zcross in #1188
- Migrate piechart plugin on Grafana Dashboard by @MiguelNdeCarvalho in #1190
- Permission error when deleting Pod sometimes
New Contributors
- @MiguelNdeCarvalho made their first contribution in #1190
Full Changelog: release-0.21.2...release-0.21.3
release-0.21.2
Added
- Added support of clickhouse-log via podTemplates by @dmmarkov in #1012
- Added SQL UDFs replication when adding shards/replicas. Closes #1174
Changed
- Operator and metrics-exporter now automatically select 'http' or 'https' for connecting to cluster based on cluster configuration
- Changed statefulSet update behaviour to abort the update on failure. That protects the rest of the cluster from a breaking changes
- Removed unneeded operations on persistent volumes
Fixed
- fix crash when reconcilePVC() failed by @jewelzqiu in #1168
- fix reconcile threads number by @jewelzqiu in #1170
Other
- Run tests in parallel by @antip00 in #1171
- fix build script to adapt to macos m1. by @xiedeyantu in #1169
- Improvements to Grafana and Prometheus manifests
New Contributors
- @jewelzqiu made their first contribution in #1168
- @xiedeyantu made their first contribution in #1169
- @dmmarkov made their first contribution in #1012
Full Changelog: release-0.21.1...release-0.21.2
release-0.21.1
Added
- Added configurable shard-level concurrent reconciliation by @zcross in #1124. It is controlled by the following operator configuration settings:
reconcile:
runtime:
# Max number of concurrent CHI reconciles in progress
reconcileCHIsThreadsNumber: 10
# Max number of concurrent shard reconciles in progress
reconcileShardsThreadsNumber: 1
# The maximum percentage of cluster shards that may be reconciled in parallel
reconcileShardsMaxConcurrencyPercent: 50
- Added default configuration for ClickHouse system.trace_log table with 30 days TTL
Changed
- ZooKeeper manifests were rewritten to store configuration separately
Fixed
- Fixed a bug in metrics-exporter that might stop working on ClickHouse nodes with certain types of system.errors. Closes #1161
- Fixed a bug with ClickHouse major version detection for Altinity.Stable builds
Full Changelog: release-0.21.0...release-0.21.1
release-0.21.0
What's Changed
- Changed the way Operator applies ClickHouse settings. In the previous version, every change in settings required a restart via re-creating a StatefulSet. In this version it does not re-create StatefulSet anymore, but maintains a logic that decide if ClickHouse needs to be restarted in order to pickup a change or not. In case of restart it is performed using scaling stateful set down and up. The restart logic is controlled by configurationRestartPolicy configuration setting. Here is the default configuration:
rules:
- version: "*"
rules:
- settings/*: "yes"
- settings/dictionaries_config: "no"
- settings/logger: "no"
- settings/macros/*: "no"
- settings/max_server_memory_*: "no"
- settings/max_*_to_drop: "no"
- settings/max_concurrent_queries: "no"
- settings/models_config: "no"
- settings/user_defined_executable_functions_config: "no"
- zookeeper/*: "yes"
- files/config.d/*.xml: "yes"
- files/config.d/*dict*.xml: "no"
- profiles/default/background_*_pool_size: "yes"
- profiles/default/max_*_for_server: "yes"
- version: "21.*"
rules:
- settings/logger: "yes"
- Improved reliability of schema creation for new shards/replicas
- Added "secure" option for Zookeeper connections by @Tvion in #1114
- Added "insecure" flag that closes insecure TCP/HTTP ClickHouse ports. See security_hardening.md for more detail
- Added an option to disable metrics exporter by @roimor in #1131
- Added system.errors scrapping into metrics-exporter
- Refactor Registry internals to make concurrent access safe by @zcross in #1115
Fixed
- Fixed a bug that might result in PDB being deleted. Closes #1139
- Fixed propagation of podTemplate environment variables from ClickHouseInstallationTemplate to ClickHouseInstallation
- Fixed propagation of startup probe from ClickHouseInstallationTemplate to ClickHouseInstallation
Non-operator changes
- Changed Grafana deployment to allow persisting custom dashboards
- Changed ZooKeeper version to 3.8.1
New Contributors
- @roimor made their first contribution in #1131
- @Tvion made their first contribution in #1114
- @zcross made their first contribution in #1115
Full Changelog: release-0.20.3...release-0.21.0
release-0.20.3
What's Changed
- Use alpine base image instead of UBI
- Fix error handling when table already exists in ZooKeeper for new ClickHouse versions
Full Changelog: release-0.20.2...release-0.20.3
release-0.20.2
What's Changed
- Added 'hostsCompleted' to the CHI status and events
- Changed some 'default' profile settings:
- Enabled 'do_not_merge_across_partitions_select_final'
- Set 'load_balancing' to 'nearest_hostname'
- Set niceness ('os_thread_priority') to 2.
- Improved stability of metric-exporter when some ClickHouse nodes are responding slowly
- Changed sequence of LB service creation to avoid a situation when service exists with no available endpoints
- Added 'secure' flag at cluster level for enabling distributed queries over TLS
- Addressed CVEs in dependent packages
Note: datatype for 'secure' flag has been changed from boolean to String (accepting, 'true', 'yes', '1' etc.)