Open
Description
Feature Description
Background
Currently, Gitea creates a new git cat-file --batch
subprocess for each request when handling git operations. While this approach is straightforward, it leads to the following issues in high-concurrency scenarios:
- High system overhead due to frequent subprocess creation/destruction
- Increased response latency as each request requires subprocess initialization
- Potential overconsumption of system resources (such as file descriptors)
- Non-gogit version of Gitea in Windows is slow for git related operations
Proposal Overview
Design and implement a lightweight git cat-file --batch
subprocess manager to improve performance and resource utilization through subprocess reuse. Key features include:
- Maintaining subprocess pools organized by repository path
- Dynamically allocating idle subprocesses to handle requests
- Automatically recycling long-idle subprocesses
- Gracefully handling high-load situations
Detailed Design
Subprocess Manager Structure
type GitCatFileManager struct {
// Subprocess pools indexed by repository path
procPools map[string]*ProcPool
mutex sync.RWMutex
maxProcsPerRepo int // Maximum number of subprocesses per repository
idleTimeout time.Duration // Idle timeout period
}
type ProcPool struct {
repoPath string
processes []*GitCatFileProcess
mutex sync.Mutex
}
type GitCatFileProcess struct {
cmd *exec.Cmd
stdin io.WriteCloser
stdout io.ReadCloser
lastUsed time.Time
inUse bool
mutex sync.Mutex
}
Core Functionality Implementation
Acquiring a Subprocess
func (m *GitCatFileManager) Get(repoPath string) (*GitCatFileProcess, error) {
m.mutex.RLock()
pool, exists := m.procPools[repoPath]
m.mutex.RUnlock()
if !exists {
m.mutex.Lock()
// Double-check to avoid race conditions
pool, exists = m.procPools[repoPath]
if !exists {
pool = &ProcPool{repoPath: repoPath}
m.procPools[repoPath] = pool
}
m.mutex.Unlock()
}
return pool.getProcess()
}
func (p *ProcPool) getProcess() (*GitCatFileProcess, error) {
p.mutex.Lock()
defer p.mutex.Unlock()
// Look for an idle process
for _, proc := range p.processes {
if !proc.inUse {
proc.inUse = true
proc.lastUsed = time.Now()
return proc, nil
}
}
// Check if maximum limit has been reached
if len(p.processes) >= maxProcsPerRepo {
return nil, errors.New("reached max processes limit for repository")
}
// Create a new process
proc, err := newGitCatFileProcess(p.repoPath)
if err != nil {
return nil, err
}
p.processes = append(p.processes, proc)
return proc, nil
}
Creating a New Subprocess
func newGitCatFileProcess(repoPath string) (*GitCatFileProcess, error) {
cmd := exec.Command("git", "-C", repoPath, "cat-file", "--batch")
stdin, err := cmd.StdinPipe()
if err != nil {
return nil, err
}
stdout, err := cmd.StdoutPipe()
if err != nil {
stdin.Close()
return nil, err
}
if err := cmd.Start(); err != nil {
stdin.Close()
stdout.Close()
return nil, err
}
return &GitCatFileProcess{
cmd: cmd,
stdin: stdin,
stdout: stdout,
lastUsed: time.Now(),
inUse: true,
}, nil
}
Releasing a Subprocess
func (m *GitCatFileManager) Release(proc *GitCatFileProcess) {
proc.mutex.Lock()
proc.inUse = false
proc.lastUsed = time.Now()
proc.mutex.Unlock()
}
Periodic Cleanup
func (m *GitCatFileManager) StartCleaner(interval time.Duration) {
ticker := time.NewTicker(interval)
go func() {
for range ticker.C {
m.cleanIdleProcesses()
}
}()
}
func (m *GitCatFileManager) cleanIdleProcesses() {
now := time.Now()
m.mutex.Lock()
defer m.mutex.Unlock()
for repoPath, pool := range m.procPools {
pool.mutex.Lock()
activeProcs := make([]*GitCatFileProcess, 0, len(pool.processes))
for _, proc := range pool.processes {
proc.mutex.Lock()
if !proc.inUse && now.Sub(proc.lastUsed) > m.idleTimeout {
// Close long-idle processes
proc.stdin.Close()
proc.cmd.Process.Kill()
proc.cmd.Wait() // Avoid zombie processes
proc.mutex.Unlock()
} else {
proc.mutex.Unlock()
activeProcs = append(activeProcs, proc)
}
}
pool.processes = activeProcs
pool.mutex.Unlock()
// Remove empty process pools
if len(pool.processes) == 0 {
delete(m.procPools, repoPath)
}
}
}
start the manager
// Global instance
var gitCatFileManager = NewGitCatFileManager(
10, // Maximum subprocesses per repository
5*time.Minute, // Idle timeout period
)
func init() {
// Start the cleanup goroutine, checking once per minute
gitCatFileManager.StartCleaner(1 * time.Minute)
}
Implementation Considerations
- Error Handling: Detect and handle subprocess abnormal exit situations
- Thread Safety: Use appropriate mutex locks to ensure concurrency safety
- Resource Limits: Add a global maximum process limit to prevent resource exhaustion
- Monitoring Metrics: Add monitoring for subprocess pool usage to facilitate troubleshooting
Performance Expectations
- Reduced Latency: Most requests use already-initialized subprocesses, avoiding startup overhead
- Increased Throughput: Reduced system-level call overhead in high-concurrency scenarios
- Lowered Resource Consumption: Control of total subprocess count prevents excessive resource usage
Drawbacks
- Increased Complexity: The solution adds complexity to Gitea's codebase with new data structures, synchronization mechanisms, and lifecycle management that will need to be maintained.
- Memory Footprint: Long-running subprocesses will consume more memory over time compared to short-lived ones. Each cached subprocess maintains open file handles and memory buffers.
- UnReleased sub process maybe stuck forever, there should be timeout for a subprocess session.
- When a
git cat-file --batch
run for a long time and repository updated, what will happen.
TODO
- Implement a basic version and conduct performance benchmark tests
- Consider adding subprocess health check mechanisms
- Integrate with Gitea's monitoring and trace system