Releases: kubeflow/spark-operator
Releases · kubeflow/spark-operator
v2.5.0
🚀 Spark on Kubernetes Operator v2.5.0
We're excited to announce the release of Spark on Kubernetes Operator v2.5.0! This release introduces
alpha feature gates, namespace label-based watching, Python API generation, SparkConnect webhook validation,
and several important bug fixes including an OOM prevention fix.
✨ New Features
- Feature gate mechanism — Alpha feature gates are now supported, enabling experimental features to be
toggled safely (#2794) LoadSparkDefaultsfeature gate (Alpha) — Adds--load-spark-defaultsto spark-submit for
Spark 4.0+ (#2798)PartialRestartfeature gate (Alpha) — Skips reconcile for webhook-patched executor fields,
reducing unnecessary restarts (#2786)- Watch namespaces by labels — Operator can now watch namespaces based on label selectors
(#2808) - Configurable timestamp precision for
ScheduledSparkApplicationnames
(#2827) - Python API generation for Spark Operator CRDs
(#2828)
🐛 Bug Fixes
- OOM prevention — Add label selector to ConfigMap cache to prevent OOM via informer flooding
(#2881) - Fix duplicate webhook patch and add missing
ScheduledSparkApplicationpatches
(#2875) - Fix: correct schedule parse error logging in
ScheduledSparkApplicationcontroller
(#2841) - Fix: handle nil
Executor.InstancesinGetExecutorRequestResource
(#2834) - Fix: correct filename typo in metrics package
(#2830) - Fix: propagate batch scheduler initialization errors to trigger retries
(#2783) - Reset
SparkApplicationstatus when transitioning fromSUCCEEDING/FAILING→PENDING_RERUN
(#2773) - Server-side apply CRDs by forcing conflicts
(#2800) - Fix
SparkConnectnil executor template panic
(#2814)
⛵ Helm Chart
- Add
hostUsers(user namespace) option
(#2721) - Expose
metrics-labelsflag via Helm chart
(#2817) - Add toleration, affinity, and nodeSelector support for upgrade hook
(#2780)
⚡ SparkConnect
v2.5.0-rc.0
Official Release v2.5.0-rc.0
v2.4.0
If you want to upgrade to v2.4.0, remember to set hook.upgradeCrd=true when running helm upgrade. This will create a Helm pre-install/pre-upgrade Job to run kubectl apply --server-side to update CRDs.
Announcements
- We are going to deprecate Kustomize manifests after three months, see #2702. We suggest using Helm to manage spark-operator releases. Plase leave comments in the PR if you are still using Kustomize manifests.
Highlights
- SparkApplication now can be suspended/resumed by setting
.spec.suspendtotrue/falseand will have an integration with Kueue, see kubernetes-sigs/kueue#7268. - SparkConnect server service can be customized by
.spec.server.service.
Features
- fix(chart): add revisionHistoryLimit option (#2625 by @t3mi)
- Suspend/Resume feature on SparkApplication (#2387 by @everpeace)
- Set ControllerReference on driver pod and non-Controller OwnerReference on executor pod at submission time (#2670 by @everpeace)
- added SparkApp name validator to accept valid DNS-1035 format (#2711 by @aryankumar04)
- No propagate Kueue labels to driver and executor pods (#2714 by @everpeace)
- feat: add support for customizing connect service (#2709 by @ChenYi015)
Bug Fixes
- Grant get/patch permissions for SparkConnect CRD to hook ClusterRole (#2605 by @ChenYi015)
- Grant create events permissions to Controller (#2616 by @Rockawear)
- fix(connect): propagate executor memory setting to spark options (#2656 by @mrjoe7)
- [fix] add miss cli params. Add leader election params in chart (#2657 by @aagumin)
- fix: webhook encoder configuration (#2664 by @pvbouwel)
- bugfix: hook.upgradeCrd use cases (#2663 by @pvbouwel)
- Correct entrypoint.sh for Openshift (#2645 by @Reamer)
- Fix driver host configuration to handle IPv6 addresses (#2703 by @tiagotxm)
- Add permissions for sparkconnects/finalizers (#2727 by @ChenYi015)
- Remove redundant name and namespace context in logs (#2723 by @ChenYi015)
- Logging info when resources associated with SparkApplication still exist (#2725 by @ChenYi015)
Unit Tests
- Add kube-scheduler podgroup unit tests (#2689 by @shadowinlife)
- Generate HTML coverage report after running unit tests (#2691 by @shadowinlife)
- Avoid 409 (Conflict) error at e2e tests with retry. (#2695 by @shadowinlife)
- Add unit test for sparkapplication_validator (#2692 by @shadowinlife)
- Add unit test for scheduledsparkapplication_validator (#2694 by @shadowinlife)
- Refactor the unit tests for web_ui.go (#2696 by @shadowinlife)
Refactor
- unify SparkApplication defaulting logic (#2671 by @zhzhuang-zju)
- Replace strconv.Atoi with strconv.ParseInt (#2699 by @ChenYi015)
- refactor: use ptr.To to replace util.XxxPtr (#2693 by @ChenYi015)
Dependencies
- Bump actions/checkout from 4 to 5 (#2623 by @dependabot[bot])
- Bump actions/download-artifact from 4 to 5 (#2624 by @dependabot[bot])
- Bump helm.sh/helm/v3 from 3.18.4 to 3.18.5 (#2627 by @dependabot[bot])
- Bump github.com/go-viper/mapstructure/v2 from 2.3.0 to 2.4.0 (#2635 by @dependabot[bot])
- Bump actions/setup-go from 5 to 6 (#2651 by @dependabot[bot])
- Bump actions/stale from 9 to 10 (#2649 by @dependabot[bot])
- Bump aquasecurity/trivy-action from 0.32.0 to 0.33.1 (#2650 by @dependabot[bot])
- Bump github.com/golang/glog from 1.2.4 to 1.2.5 (#2620 by @dependabot[bot])
- Bump github.com/onsi/ginkgo/v2 from 2.23.3 to 2.26.0 (#2660 by @dependabot[bot])
- Bump sigs.k8s.io/scheduler-plugins from 0.31.8 to 0.32.7 (#2675 by @dependabot[bot])
- Bump github/codeql-action from 3 to 4 (#2672 by @dependabot[bot])
- Bump github.com/stretchr/testify from 1.10.0 to 1.11.1 (#2673 by @dependabot[bot])
- Bump golang.org/x/time from 0.9.0 to 0.14.0 (#2686 by @dependabot[bot])
- Bump github.com/prometheus/client_golang from 1.22.0 to 1.23.2 (#2682 by @dependabot[bot])
- Bump github.com/spf13/viper from 1.20.1 to 1.21.0 (#2683 by @dependabot[bot])
- Bump github/codeql-action from 3.29.2 to 4.31.0 (#2684 by @dependabot[bot])
- Bump github.com/containerd/containerd from 1.7.27 to 1.7.29 (#2712 by @dependabot[bot])
- Bump sigs.k8s.io/yaml from 1.5.0 to 1.6.0 (#2704 by @dependabot[bot])
- Bump github.com/spf13/cobra from 1.9.1 to 1.10.1 (#2705 by @dependabot[bot])
- Bump github/codeql-action from 4.31.0 to 4.31.2 (#2707 by @dependabot[bot])
- Bump actions/upload-artifact from 4 to 5 (#2681 by @dependabot[bot])
- Bump actions/download-artifact from 5 to 6 (#2680 by @dependabot[bot])
- Bump actions/checkout from 4.2.2 to 5.0.0 (#2685 by @dependabot[bot])
- Bump ossf/scorecard-action from 2.4.2 to 2.4.3 (#2687 by @dependabot[bot])
- Bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.2 (#2719 by @dependabot[bot])
- Bump golang.org/x/mod from 0.27.0 to 0.29.0 (#2720 by @dependabot[bot])
- Bump helm/chart-testing-action from 2.7.0 to 2.8.0 (#2722 by @dependabot[bot])
Misc
- Add changelog for v2.3.0 (#2614 by [@ChenYi015](https://g...
v2.4.0-rc.1
Official Release v2.4.0-rc.1
v2.4.0-rc.0
Official Release v2.4.0-rc.0
v2.3.0
Highlights
- Support Spark v4.
- Add support for Spark Connect by adding a new CRD called
SparkConnect. One example can be found here. - Upgrade CRDs automatically when running
helm upgradeby settinghook.upgradeCrd=true. This will create a Helm pre-install/pre-upgrade Job to runkubectl apply --server-sideto update CRDs. - Configure logging format by setting
{controller,webhook}.logEncodertojsonorconsole.
Features
- Add support for Spark Connect (#2569 by @ChenYi015)
- upgrade to Spark 4.0.0 (#2564 by @nabuskey)
- Make logging encoder configurable (#2580 by @ChenYi015)
- Include pod.Status.Message in recordExecutorEvent (#2589 by @matschaffer-roblox)
- Add print columns for Spark Connect (#2592 by @nabuskey
- chore: update prometheus pattern and labels for structured streaming driver (#2581 by @yahwang)
- Add Helm hook to upgrade CRDs (#2371 by @ChenYi015)
Bug Fixes
- Splat recordExecutorEvent args for cleaner event messages (#2582 by @matschaffer-roblox)
- fix: should add executor env when driver env is empty (#2586 by @ChenYi015)
- Add web UI configurations when enabling UI service and ingress (#2599 by @ChenYi015)
- Grant get/patch permissions for SparkConnect CRD to hook ClusterRole (#2605 by @ChenYi015)
Dependencies
- Bump aquasecurity/trivy-action from 0.31.0 to 0.32.0 (#2585 by @dependabot[bot])
- Bump github.com/go-viper/mapstructure/v2 from 2.2.1 to 2.3.0 (#2572 by @dependabot[bot])
- Bump sigs.k8s.io/yaml from 1.4.0 to 1.5.0 (#2577 by @dependabot[bot])
- Bump helm.sh/helm/v3 from 3.17.3 to 3.18.4 (#2587 by @dependabot[bot])
- Bump golang.org/x/mod from 0.25.0 to 0.26.0 (#2608 by @dependabot[bot])
- Bump github.com/onsi/gomega from 1.36.1 to 1.37.0 (#2607 by @dependabot[bot])
Misc
- Add SparkConnect e2e test (#2578 by @ChenYi015)
- feat(docs): Guide to report security vulnerabilities (#2593 by @andreyvelich)
- Read Helm version from go.mod file (#2598 by @ChenYi015)
v2.3.0-rc.0
Official Release v2.3.0-rc.0
v2.2.1
Features
- Customize ingress URL with Spark application ID (#2554 by @ChenYi015)
- Make default ingress tls and annotations congurable in the helm config (#2513 by @Tom-Newton)
- Use code-generator for clientset, informers, listers (#2563 by @jbhalodia-slack)
Misc
- add driver ingress unit tests (#2552 by @nabuskey)
- Get logger from context (#2551 by @ChenYi015)
- Update golangci lint (#2560 by @joshuacuellar1)
Dependencies
- Bump aquasecurity/trivy-action from 0.30.0 to 0.31.0 (#2557 by @dependabot[bot])
- Bump github.com/prometheus/client_golang from 1.21.1 to 1.22.0 (#2548 by @dependabot[bot])
- Bump sigs.k8s.io/scheduler-plugins from 0.30.6 to 0.31.8 (#2549 by @dependabot[bot])
- Bump golang.org/x/mod from 0.24.0 to 0.25.0 (#2566 by @dependabot[bot])
- Bump github.com/go-logr/logr from 1.4.2 to 1.4.3 (#2567 by @dependabot[bot])
v2.2.0
Features
- Upgrade to Spark 3.5.5 (#2490 by @jacobsalway)
- Add timeZone to ScheduledSparkApplication (#2471 by @jacobsalway)
- Enable the override of MemoryLimit through webhook (#2478 by @danielrsfreitas)
- Add ShuffleTrackingEnabled to DynamicAllocation struct to allow disabling shuffle tracking (#2511 by @jbhalodia-slack)
- Define SparkApplicationSubmitter interface to allow customizing submitting mechanism (#2500 by @ChenYi015)
- Add support for using cert manager to generate webhook certificates (#2373 by @ChenYi015)
Bug Fixes
- fix: add webhook cert validity checking (#2489 by @teejaded)
- fix and add back unit tests (#2532 by @nabuskey)
- fix volcano tests (#2533 by @nabuskey)
- Add v2 to module path (#2515 by @ChenYi015)
- #2525 spark metrics in depends on prometheus (#2529 by @blcksrx)
Misc
- Add APRA AMCOS to adopters (#2485 by @shuch3ng)
- Bump github.com/stretchr/testify from 1.9.0 to 1.10.0 (#2488 by @dependabot[bot])
- Bump github.com/prometheus/client_golang from 1.20.5 to 1.21.1 (#2487 by @dependabot[bot])
- Bump sigs.k8s.io/controller-runtime from 0.20.1 to 0.20.4 (#2486 by @dependabot[bot])
- Deprecating sparkctl (#2484 by @vikas-saxena02)
- Changing image repo from docker.io to ghcr.io (#2483 by @vikas-saxena02)
- Upgrade Golang to 1.24.1 and golangci-lint to 1.64.8 (#2494 by @jacobsalway)
- Bump helm.sh/helm/v3 from 3.16.2 to 3.17.3 (#2503 by @dependabot[bot])
- Add changelog for v2.1.1 (#2504 by @ChenYi015)
- Remove sparkctl (#2466 by @ChenYi015)
- Bump github.com/spf13/viper from 1.19.0 to 1.20.1 (#2496 by @dependabot[bot])
- Bump golang.org/x/net from 0.37.0 to 0.38.0 (#2505 by @dependabot[bot])
- Remove clientset, informer and listers generated by code-generator (#2506 by @ChenYi015)
- Remove v1beta1 API (#2516 by @ChenYi015)
- add unit tests for driver and executor configs (#2521 by @nabuskey)
- Adding securityContext to spark examples (#2530 by @tarekabouzeid)
- Bump github.com/spf13/cobra from 1.8.1 to 1.9.1 (#2497 by @dependabot[bot])
- Bump golang.org/x/mod from 0.23.0 to 0.24.0 (#2495 by @dependabot[bot])
- Adding Manabu to the reviewers (#2522 by @vara-bonthu)
- Bump manusa/actions-setup-minikube from 2.13.1 to 2.14.0 (#2523 by @dependabot[bot])
- Bump k8s.io dependencies to v0.32.5 (#2540 by @ChenYi015)
- Pass the correct LDFLAGS when building the operator image (#2541 by @ChenYi015)
v2.2.0-rc.1
Spark Operator Official Release v2.2.0-rc.1