Skip to content

Conversation

@lexfrei
Copy link

@lexfrei lexfrei commented Dec 21, 2025

Summary

When a network interface goes down and back up (e.g., network driver restart like Mellanox mcab), the kernel drops multicast group membership. However, keepalived doesn't re-add the membership because VRRP sockets remain open (fd_in != -1).

This causes VRRP instances to stop receiving advertisements and stay stuck in BACKUP state indefinitely, even though the interface is back up and functional.

Changes

  • Modify interface_up() to close and reopen VRRP sockets when interface comes back up
  • This restores multicast group membership (IP_ADD_MEMBERSHIP)
  • Logic mirrors socket handling in cleanup_lost_interface() but triggers on interface state change rather than deletion

Test plan

  • Start keepalived with VRRP instance on interface
  • Bring interface down: ip link set eth0 down
  • Bring interface up: ip link set eth0 up
  • Verify keepalived receives VRRP packets and transitions correctly
  • Verify multicast membership restored: ip maddr show should show 224.0.0.18

Related

Fixes: #1847

When an interface goes down and back up (e.g., network driver restart),
the kernel drops multicast group membership. However, keepalived doesn't
re-add the membership because the VRRP sockets remain open (fd_in != -1).

This causes VRRP instances to stop receiving advertisements and stay
stuck in BACKUP state indefinitely, even though the interface is back up.

Fix by closing and reopening VRRP sockets in interface_up() to restore
multicast group membership. This mirrors the socket handling logic in
cleanup_lost_interface() but triggers on interface state change rather
than interface deletion.

Fixes: acassen#1847

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Aleksei Sviridkin <[email protected]>
@lexfrei lexfrei marked this pull request as ready for review December 21, 2025 13:10
lexfrei added a commit to lexfrei/charts that referenced this pull request Dec 21, 2025
…161)

# Pull Request

## Description

Add fully customizable probe configuration and trackInterface option for
the vipalived chart.

**Key features:**
- Configurable `livenessProbe.exec.command` (default: `pgrep
keepalived`, overridable)
- New `startupProbe` support for initial VIP acquisition scenarios
- New `readinessProbe` support
- `enabled` flag for all probes (livenessProbe enabled by default)
- `trackInterface` option as workaround for keepalived interface
recreation bug

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Documentation update
- [ ] Chart version bump

## Checklist

### Required for all PRs

- [x] I have tested these changes locally
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code

### Required for chart changes

- [x] Chart version has been bumped in `Chart.yaml`
- [x] **`Chart.yaml` annotations updated with changelog entries**
(`artifacthub.io/changes`)
- [x] `values.schema.json` has been updated (if new values were added)
- [x] `README.md` has been updated (if new values were added)
- [x] All new values have proper descriptions/comments
- [x] Tests have been added/updated in `tests/` directory
- [x] All tests pass locally (`helm unittest charts/<chart-name>`)
- [ ] Schema validation passes (`check-jsonschema --schemafile
charts/<chart-name>/values.schema.json charts/<chart-name>/values.yaml`)
- [x] Helm lint passes (`helm lint charts/<chart-name>`)

### If adding new templates

- [x] Templates follow Helm best practices
- [x] Templates use proper indentation (`nindent` instead of `indent`)
- [x] Conditional rendering uses `{{- if }}` to control whitespace

### If modifying existing functionality

- [x] Changes are backward compatible OR breaking changes are documented
- [x] Default values maintain existing behavior

## Additional context

Closes #160

This implementation follows TDD approach with 12 new tests added (96
total).

**Backward compatible:** Existing configurations work without changes.
Default livenessProbe behavior is preserved (`pgrep keepalived`).

**Related upstream fix:**
acassen/keepalived#2681

---------

Signed-off-by: Aleksei Sviridkin <[email protected]>
Co-authored-by: Claude <[email protected]>
lexfrei added a commit to lexfrei/k8s that referenced this pull request Dec 21, 2025
- Upgrade vipalived chart from 0.5.2 to 0.6.0
- Add trackInterface workaround for keepalived issue 1847
- Configure startupProbe to verify MASTER state (60s timeout)
- Configure livenessProbe to check VIP presence on interface

This ensures automatic pod restart if keepalived gets stuck in
BACKUP state after network interface recreation.

Refs: lexfrei/charts#160, acassen/keepalived#2681

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Aleksei Sviridkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Creating link or setting interface up with unicast_src before address assigned to i/f doesn't work

1 participant