feat: add unified status socket for health monitoring #2682

lexfrei · 2025-12-21T14:32:35Z

Motivation

Modern container orchestration platforms like Kubernetes require reliable health probes to manage application lifecycle. Currently, keepalived lacks a lightweight mechanism for external health checks that works across all its operational modes.

The Problem

Container environments need health endpoints. When running keepalived in Kubernetes:

livenessProbe needs to verify keepalived is functioning correctly
readinessProbe needs to know if this instance should receive traffic (MASTER vs BACKUP)
Monitoring systems need programmatic access to keepalived state

Existing options are insufficient:

Option	Limitation
Check process existence	Doesn't detect stuck/unhealthy state
Parse logs	Fragile, requires log access, not real-time
SNMP	Heavy dependency, complex setup for simple health checks
D-Bus	VRRP-only, requires D-Bus infrastructure
Notify scripts	One-way, can't query current state

Need for unified status. Keepalived runs multiple child daemons:

VRRP daemon (virtual router redundancy)
Checker daemon (LVS real server health)
BFD daemon (bidirectional forwarding detection)

A health check endpoint should report aggregated state from ALL active daemons, not just one.

The Solution

This PR adds an optional Unix domain socket that provides:

Simple text protocol - no dependencies, works with socat, nc, or any socket client
Unified status - aggregates state from VRRP, Checker, and BFD daemons
Two query modes:
- HEALTH - single-word response for probe scripts: MASTER, BACKUP, or FAULT
- STATUS - JSON response with detailed per-daemon information

Architecture

The status socket runs in the parent (watchdog) process, not in child daemons. This follows keepalived's existing patterns (like BFD event pipes) and provides true unified health:

┌─────────────────────────────────────────────────────────┐
│                    PARENT PROCESS                       │
│                                                         │
│  ┌───────────────┐    ┌────────────────────────────┐    │
│  │ Status Socket │◄───│ Aggregated State           │    │
│  │ /var/run/...  │    │ - vrrp: MASTER/BACKUP/FAULT│    │
│  │ HEALTH/STATUS │    │ - checker: UP/DOWN/FAULT   │    │
│  └───────────────┘    │ - bfd: UP/DOWN/FAULT       │    │
│         ▲             └────────────────────────────┘    │
│         │                      ▲  ▲  ▲                  │
│  ┌──────┴──────────────────────┴──┴──┴──────────┐       │
│  │              Pipe Readers                    │       │
│  └──────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────┘
         ▲                    ▲                   ▲
         │ pipe               │ pipe              │ pipe
    ┌────┴────┐          ┌────┴────┐         ┌────┴────┐
    │  VRRP   │          │ Checker │         │   BFD   │
    │  Child  │          │  Child  │         │  Child  │
    └─────────┘          └─────────┘         └─────────┘

Children send status_event_t structs via pipes on state changes. Parent maintains aggregated state and responds to socket queries.

Usage

Configuration

global_defs {
    # Enable with default path
    status_socket /var/run/keepalived/status.sock
    
    # Optional: set permissions (default: 0600)
    status_socket_mode 0660
}

Build

./configure --enable-status-socket
make

Querying

# Health check (for probes)
$ echo "HEALTH" | socat - UNIX:/var/run/keepalived/status.sock
MASTER

# Full status (for monitoring)
$ echo "STATUS" | socat - UNIX:/var/run/keepalived/status.sock
{"vrrp":{"state":"UP","instances":2,"fault":0,"master":1},"checker":{"state":"UP","instances":4,"fault":0}}

Kubernetes Integration

livenessProbe:
  exec:
    command:
      - sh
      - -c
      - echo HEALTH | socat - UNIX:/var/run/keepalived/status.sock | grep -qE "MASTER|BACKUP"
  initialDelaySeconds: 10
  periodSeconds: 5

readinessProbe:
  exec:
    command:
      - sh
      - -c
      - echo HEALTH | socat - UNIX:/var/run/keepalived/status.sock | grep -q MASTER
  periodSeconds: 2

Implementation Details

~600 lines of new code across core and daemon files
Conditional compilation via --enable-status-socket / _WITH_STATUS_SOCKET_
Non-blocking I/O using existing scheduler infrastructure
No external dependencies - pure POSIX sockets
Follows existing patterns - modeled after BFD event pipes for IPC

Files Changed

File	Change
`configure.ac`	Add `--enable-status-socket` option
`keepalived/include/status_event.h`	NEW - IPC event structure
`keepalived/core/status_socket.c`	NEW - socket and pipe handling
`keepalived/core/main.c`	Pipe creation and socket init
`keepalived/vrrp/vrrp_daemon.c`	Close unused pipes in child
`keepalived/vrrp/vrrp_notify.c`	Send VRRP status events
`keepalived/check/check_daemon.c`	Close unused pipes in child
`keepalived/check/ipwrapper.c`	Send Checker status events
`keepalived/bfd/bfd_daemon.c`	Close unused pipes in child
`keepalived/bfd/bfd_event.c`	Send BFD status events
`doc/man/man5/keepalived.conf.5.in`	Documentation

Testing

Tested manually with:

VRRP-only configuration
Multiple VRRP instances with state transitions
Concurrent socket queries

Add optional status socket feature for monitoring VRRP state via unix domain socket. Useful for container health checks (e.g., Kubernetes liveness probes) without external dependencies. Protocol: - HEALTH command: returns MASTER/BACKUP/FAULT based on VRRP state - STATUS command: returns JSON with all VRRP instance details Enable with --enable-status-socket configure option. Configure via status_socket and status_socket_mode keywords. Co-Authored-By: Claude <[email protected]> Signed-off-by: Aleksei Sviridkin <[email protected]>

…atus Move the status socket from VRRP child process to parent process to provide aggregated status from all daemons (VRRP, Checker, BFD). Architecture changes: - Status socket now runs in parent (watchdog) process - Each child daemon sends status events via pipes to parent - Parent aggregates state and responds to HEALTH/STATUS queries - HEALTH returns FAULT/MASTER/BACKUP based on all daemons' state - STATUS returns JSON with details from all running daemons This follows the existing BFD event pipe pattern for child-to-parent IPC and provides true unified health monitoring for deployments that use multiple keepalived features. Co-Authored-By: Claude <[email protected]> Signed-off-by: Aleksei Sviridkin <[email protected]>

lexfrei and others added 2 commits December 21, 2025 16:54

lexfrei marked this pull request as ready for review December 21, 2025 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add unified status socket for health monitoring #2682

feat: add unified status socket for health monitoring #2682

lexfrei commented Dec 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat: add unified status socket for health monitoring #2682

Are you sure you want to change the base?

feat: add unified status socket for health monitoring #2682

Conversation

lexfrei commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

The Problem

The Solution

Architecture

Usage

Configuration

Build

Querying

Kubernetes Integration

Implementation Details

Files Changed

Testing

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lexfrei commented Dec 21, 2025 •

edited

Loading