Skip to content

Conversation

@jlon
Copy link
Contributor

@jlon jlon commented Oct 13, 2025

Overview

This PR introduces cluster-wide capacity-based eviction management for Curvine, enabling automatic cleanup of cached data when cluster storage usage exceeds configurable thresholds. The implementation uses a quota_rate-based strategy with pluggable eviction policies (currently supporting LRU).

Design Goals

  1. Cluster-Level Resource Management: Monitor and manage storage capacity across the entire Curvine cluster
  2. Automatic Eviction: Trigger automatic data cleanup when storage usage exceeds high watermark, continuing until it drops below low watermark
  3. Policy-Based Selection: Support eviction policies with LRU as the initial implementation
  4. Non-Disruptive Operation: Perform eviction in the background without impacting normal file operations
  5. Configurable Behavior: Provide comprehensive configuration options for fine-tuning eviction behavior

Implementation Architecture

Core Components

  1. QuotaManager (curvine-server/src/master/quota/quota_manager.rs)

    • Central coordinator for cluster-wide eviction
    • Receives triggers from HeartbeatChecker
    • Creates eviction plans based on watermark thresholds
    • Orchestrates victim selection and deletion
  2. Evictor (curvine-server/src/master/quota/eviction/evictor.rs)

    • Trait: Defines interface for eviction policy implementations
    • LRUEvictor: Tracks file access patterns using an LRU cache
    • Methods:
      • on_access(): Updates access tracking when files are read
      • select_victims(): Returns least recently used files for eviction (peek only, doesn't remove)
      • remove_victims(): Removes successfully evicted files from tracking cache
  3. EvictionConf (curvine-server/src/master/quota/eviction/types.rs)

    • Configuration structure parsed from MasterConf
    • Defines watermarks, policy, mode, and rate limits
  4. HeartbeatChecker Integration (curvine-server/src/master/fs/heartbeat_checker.rs)

    • Triggers eviction checks during periodic worker heartbeat scans
    • Passes cluster capacity info to QuotaManager

Eviction Flow

HeartbeatChecker (every 2s)
    ↓
Get MasterInfo (capacity, available, fs_used)
    ↓
QuotaManager.detector(info)
    ↓
Calculate usage_ratio = curvine_used / curvine_quota
    ↓
If usage_ratio >= high_watermark (41%):
    ↓
Create EvictPlan: target_free = curvine_used - (low_watermark * curvine_quota)
    ↓
Loop until target_free_bytes <= 0 OR no more victims:
    │
    ├─> Evictor.select_victims(scan_page) → Get LRU files (peek only)
    │
    ├─> Calculate total_freed (sum of file sizes)
    │
    ├─> Execute eviction (delete/free files)
    │
    ├─> Evictor.remove_victims() → Remove from LRU cache
    │
    └─> Update target_free_bytes -= total_freed

Configuration Parameters

[master]
# Enable cluster-wide eviction
enable_quota_eviction = false

# Eviction mode: "free" (keep metadata) or "delete" (remove completely)
quota_eviction_mode = "delete"

# Eviction policy: "lru", "lfu", or "arc"
quota_eviction_policy = "lru"

# High watermark: trigger eviction when usage exceeds this ratio
quota_eviction_high_rate = 0.80  # 80%

# Low watermark: continue eviction until usage drops below this ratio
quota_eviction_low_rate = 0.60   # 60%

# Number of files to scan per eviction round
quota_eviction_scan_page = 10

Usage

Basic Setup

  1. Enable eviction in configuration:
    enable_quota_eviction = true
    quota_eviction_high_rate = 0.80
    quota_eviction_low_rate = 0.60
  2. Monitor eviction activity in master logs:
    cluster-evict: starting eviction, curvine_used=XXX, curvine_quota=YYY, usage_ratio=ZZ%
    cluster-evict: reached target_free_bytes, stopping eviction
    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant