-
-
Notifications
You must be signed in to change notification settings - Fork 53
runn loadt Performance Degradation Due to HTTP Connection Pool Mismanagement #1410
Description
Summary
runn loadt exhibits significant performance degradation compared to dedicated load testing tools like k6, particularly around connection handling. This
issue affects both client-side throughput (low RPS) and server-side performance (increased latency on operations like database BEGIN).
Root Cause Analysis
Three issues in the current implementation cause HTTP connections to be repeatedly created and destroyed instead of being reused across load test
iterations.
- copyOperators() recreates HTTP clients on every iteration (highest impact)
On every load test iteration after the first, copyOperators() calls New() which re-parses the runbook YAML and creates entirely new operators —
including new http.Client instances with fresh http.Transport (connection pools).
func copyOperators(ops []*operator, opts []Option) ([]*operator, error) {
var c []*operator
for _, op := range ops {
// FIXME: Need the function to copy the operator as it is heavy to parse the runbook each time
oo, err := New(append([]Option{Book(op.bookPath)}, opts...)...)The old HTTP clients are never explicitly closed (operator.Close() at operator.go:150-171 handles gRPC, CDP, SSH, and DB runners but not HTTP runners),
so they are left for garbage collection with their connection pools still open.
Impact: Every iteration forces new TCP + TLS handshakes. No connection reuse across iterations. The same issue exists in randomOperators() at
operator.go:2066-2082.
- MaxIdleConnsPerHost defaults to 2 (Go stdlib default)
File: http.go:80-100
tp, ok := http.DefaultTransport.(*http.Transport)
// ...
client: &http.Client{
Transport: tp.Clone(), // Go default: MaxIdleConnsPerHost = 2
Timeout: time.Second * 30,
},Go's http.DefaultTransport is cloned without tuning. The default MaxIdleConnsPerHost = 2 means that even when connections could be reused, only 2 idle
connections are kept per host. With --load-concurrent > 2, additional workers must create new TCP connections.
- TLS configuration is cloned per-request
File: http.go:425-432
if ts, ok := rnr.client.Transport.(*http.Transport); ok {
existingConfig := ts.TLSClientConfig
if existingConfig != nil {
ts.TLSClientConfig = existingConfig.Clone()
} else {
ts.TLSClientConfig = new(tls.Config)
}
ts.TLSClientConfig.InsecureSkipVerify = rnr.skipVerify
}On every HTTP request execution, TLSClientConfig is cloned or newly created. This mutates the shared Transport object, which would cause a data race if
the transport were shared across goroutines. It also prevents TLS session reuse.
Server-Side Impact
The client-side connection mismanagement cascades to the server:
- Excessive TCP handshakes — Server must accept() and spawn new goroutines/threads for each connection instead of handling a stable pool
- TIME_WAIT socket accumulation — Rapidly created and abandoned connections pile up in TIME_WAIT state on both client and server
- Resource exhaustion — Under high concurrency, the server faces goroutine explosion, file descriptor exhaustion, and kernel TCP stack pressure
- Downstream effects — Database connection pools become contended, causing operations like BEGIN to take abnormally long
With k6, virtual users reuse connections throughout their lifetime, so the server only manages a stable, predictable number of connections.
Proposed Fix
Fix 1: Reuse HTTP runners in copyOperators() / randomOperators()
The pattern for runner reuse already exists: exportOptionsToBePropagated() in include.go:230-261 exports reuseHTTPRunner options, which share the
underlying *httpRunner pointer (and its http.Client). This is already used for included (nested) runbooks.
Apply the same pattern in copyOperators():
func copyOperators(ops []*operator, opts []Option) ([]*operator, error) {
var c []*operator
for _, op := range ops {
reuseOpts := op.exportOptionsToBePropagated()
allOpts := append([]Option{Book(op.bookPath)}, opts...)
allOpts = append(allOpts, reuseOpts...)
oo, err := New(allOpts...)
if err != nil {
return nil, err
}
oo.id = op.id
c = append(c, oo)
}
return c, nil
}
Note: reuseOpts must come after Book() and opts because Book() internally calls merge() which uses maps.Copy and would overwrite the reused runners.
Thread safety: http.Client and http.Transport are documented as goroutine-safe. Cookies are managed per-operator via operator.store.Cookies(), not via
http.Client.Jar (which is nil), so sharing the client does not leak cookies between operators.
Fix 2: Increase MaxIdleConnsPerHost
In newHTTPRunner(), tune the cloned transport:
cloned := tp.Clone()
cloned.MaxIdleConnsPerHost = 100This has no impact on normal scenario testing (idle connections are automatically cleaned up by timeout).
Fix 3: Move TLS configuration to initialization time
Move TLS setup from per-request (httpRunner.run()) to initialization time using sync.Once, making it safe to share runners across goroutines:
type httpRunner struct {
// ... existing fields ...
tlsOnce sync.Once
}
func (rnr *httpRunner) configureTLS() error {
var tlsErr error
rnr.tlsOnce.Do(func() {
// Configure TLS once at init time
})
return tlsErr
}
Important: This fix is a prerequisite for Fix 1 — sharing runners without fixing the per-request TLS mutation would introduce a data race.
Fix 4: Clean up HTTP idle connections in operatorN.Close()
Add CloseIdleConnections() to httpRunner and call it in operatorN.Close() (not operator.Close(), since runners are shared):
func (opn *operatorN) Close() {
closed := map[*httpRunner]bool{}
for _, op := range opn.ops {
for _, r := range op.httpRunners {
if !closed[r] {
r.CloseIdleConnections()
closed[r] = true
}
}
}
for _, op := range opn.ops {
op.Close(true)
}
}