-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Go Optimization Guide Map for Neva
This issue uses goperf.dev as an external reference and maps its performance themes to the Neva codebase.
The goal is not to blindly optimize everything. The goal is to identify the places where the guide's ideas are the best fit for Neva's compiler, runtime, CLI, and standard library, and to track related work in one place.
Status as of March 22, 2026
This issue should be treated as an umbrella performance map, not as a flat todo list.
Some of the highest-priority ideas here already have concrete in-flight implementation work:
- #1004 Redesign runtime
Msg- Directly targets the
runtime.Msgboxing / message-overhead track. - Replaces the old interface-based runtime message representation with a tagged-union design.
- Adds focused runtime microbenchmarks for message operations.
- Directly targets the
- #1023 Add runtime benchmark baseline before Msg redesign
- Adds the benchmark baseline needed to compare
#1004and future runtime changes. - Expands benchmarking beyond the old single
benchmarks/message_passingcase into a broader e2e suite underbenchmarks/runtime_bench/**.
- Adds the benchmark baseline needed to compare
- #996 Proposal: Optimize Message representation via native Go types
- Still relevant as a likely follow-up direction after
#1004, especially for native composite storage / fast paths.
- Still relevant as a likely follow-up direction after
- #1030
- Related correctness work for JSON formatting in the same serialization area touched by
#1004.
- Related correctness work for JSON formatting in the same serialization area touched by
Important status note: as of March 22, 2026, neither #1004 nor #1023 is merged into main, so this issue should distinguish between:
- work already prototyped in open PRs
- work still missing from the repository default branch
Reading the map
- Fit = how relevant the pattern is to Neva today.
- Priority = suggested order for investigation.
- Candidate = a concrete improvement idea for this repository.
- Status = whether the work is already in
main, being prototyped in an open PR, or still backlog.
1. Common Go performance patterns mapped to Neva
| goperf topic | Neva fit | Priority | Status | Candidate for Neva |
|---|---|---|---|---|
| Avoiding Interface Boxing | Very high | P0 | In progress via #1004 | Rework runtime.Msg hot paths, then remeasure and decide whether Neva still needs specialized scalar fast paths, slimmer stream-item representations, or native-composite follow-ups from #996. |
| Memory Efficiency and Go's Garbage Collector | Very high | P0 | Partially covered by #1004 and #1023 | Use the new microbench/e2e baselines to review per-message wrappers, repeated StructMsg construction/copying, and StructMsg.Get scans before introducing pools or unsafe tricks. |
| Stack Allocations and Escape Analysis | Very high | P0 | Not yet done | Run -gcflags=all=-m=2 over runtime/backend packages on top of the benchmark baseline from #1023 and the runtime changes from #1004. |
| Memory Preallocation | High | P0 | Backlog | Extend existing preallocation discipline to runtime helpers such as stream_zip_many, array-port helpers, and suspicious len(chans)^2 capacity calculations. |
| Zero-Copy Techniques | High | P1 | Backlog | Audit stdlib/runtime I/O. http_get still eagerly reads the full response body, and transport semantics around bytes vs string should stay explicit. |
| Goroutine Worker Pools | High | P1 | Backlog | Review fan-out/fan-in helpers that spawn goroutines per slot or per iteration (ReceiveAll, SendAll, struct_builder, stream_zip_many, runtime call orchestration). Neva's model is intentionally parallel, so the goal is bounded concurrency for high-cardinality helpers, not “less concurrency everywhere”. |
| Batching Operations | High | P1 | Backlog | Add batched variants or internal batching only where benchmarks show per-message overhead dominates useful work. |
| Immutable Data Sharing | High | P1 | Backlog | Document and enforce where slices/maps inside messages are treated as immutable payloads versus defensively copied payloads. This is especially relevant for transport-oriented payloads and composite messages. |
| Atomic Operations and Synchronization Primitives | Medium | P2 | Backlog | Measure whether the global send-order counter becomes a contention point under fan-out heavy benchmarks and whether ordering can be made optional on paths that never call Select. |
| Efficient Context Management | Medium | P2 | Backlog | The runtime creates/cancels contexts in Call, and stdlib network code still does not thread request-scoped context into http_get. |
| Efficient Buffering | Medium | P2 | Backlog | Most compiler codegen already uses bytes.Buffer / strings.Builder. Higher-value follow-up is in stdlib I/O and transport-facing components. |
| Struct Field Alignment | Medium | P2 | Backlog | Run a focused field-alignment pass on runtime/compiler structs and compare against the existing betteralign workflow from Makefile. |
| Object Pooling | Medium | P2 | Backlog | Benchmark whether runtime hot paths allocate enough temporary slices/messages to justify scratch-buffer pooling. Avoid applying pools to compiler codegen before measurement. |
| Lazy Initialization | Medium | P3 | Backlog | Cache/lazily initialize cold shared state only if startup profiling shows it matters. |
| Leveraging Compiler Optimization Flags | Already covered well | P3 | Mostly in main |
Keep current release defaults; possible follow-up is a dedicated debug profile with -gcflags="all=-N -l". |
2. Networking and diagnostics topics: which ones matter to Neva
Most of goperf's networking section targets long-lived Go services. Neva is primarily a compiler/runtime toolchain, so these topics are secondary, but not irrelevant.
| goperf topic | Relevance to Neva | Status / candidate |
|---|---|---|
| Benchmarking and Load Testing for Networked Go Apps | High as methodology, even outside networking | #1023 is the first concrete step here. Follow-up: land it, document how results are tracked, and pair it with lower-level runtime benchmarks from #1004. |
| Practical Example: Profiling Networked Go Applications with pprof | High as diagnostics reference | Add a short perf-playbook doc for Neva: go test -bench, -benchmem, CPU profiles, alloc profiles, and escape-analysis reports on runtime/compiler packages. |
| Efficient Use of net/http, net.Conn, and UDP | Medium | Refactor http_get to use a reusable http.Client, explicit timeouts, request context propagation, and possibly streaming variants. |
| Managing 10K+ Concurrent Connections | Low today | Keep as background reading unless Neva grows daemonized services, language servers, or remote execution endpoints inside this repo. |
| Scheduler / epoll / TLS / DNS / connection lifecycle pages | Low today | Background reading only for future infrastructure work. |
3. What Neva already does well
Several goperf ideas are already visible either in main or in active related PRs:
- The compiler already preallocates maps/slices in analysis and code generation paths where sizes are known ahead of time.
mainalready has a benchmark entry point inbenchmarks/message_passing, and #1023 expands that into a broader runtime benchmark suite.- Release builds already use size-oriented linker flags and reproducibility-friendly build flags.
- Runtime code already uses
strings.Builder,bytes.Buffer, and atomics in a few targeted places instead of overusing them everywhere.
That means the highest-value work is not “introduce basic Go performance hygiene”. The highest-value work is to put numbers around the runtime's message-passing cost model and then use those numbers to drive the next runtime changes.
4. Recommended sequence for Neva
P0 — measure and reduce message overhead
- Land and rebase the benchmark/message work already in flight:
- Run
go test -bench=. -benchmemon the touched runtime packages and benchmark suites. - Run escape-analysis reports (
-gcflags=all=-m=2) for runtime/backend packages. - Use the results to decide whether boxing, copying, or goroutine churn is the dominant cost.
P1 — fix hot runtime patterns
- Audit goroutine-per-slot helpers and compare them with bounded worker patterns.
- Audit byte/string conversions and whole-buffer I/O APIs.
- Introduce batching only where benchmarks show that message granularity dominates useful work.
P2 — smaller structural wins
- Field-alignment pass on selected structs.
- Ordering/atomic contention review for the global runtime counter.
- Context propagation and timeout cleanup for stdlib HTTP.
P3 — polish and tooling
- Add a documented perf-playbook / debug profile.
- Cache/lazily initialize cold shared state only if startup profiling shows it matters.
- Keep this issue updated as concrete benchmark and runtime work lands.
5. Suggested first issues / PRs
- Land and rebase the benchmark/message work already in flight.
- Add a perf-playbook doc (
bench,benchmem,pprof, escape analysis) that explains how to use the new benchmarks and how to compare results before/after runtime changes. - Run escape-analysis and allocation review on top of
#1004/#1023. - Review goroutine-per-slot helpers for bounded-concurrency alternatives.
- Refactor
http_getto use context-aware requests and a reusable client, while keeping transport semantics explicit (bytesvsstring) and aligning with the current stdlib direction.
Source reference
Primary source: goperf.dev
Key reference pages used while preparing this map:
- https://goperf.dev/
- https://goperf.dev/01-common-patterns/interface-boxing/
- https://goperf.dev/01-common-patterns/stack-alloc/
- https://goperf.dev/01-common-patterns/atomic-ops/
- https://goperf.dev/01-common-patterns/buffered-io/
- https://goperf.dev/01-common-patterns/comp-flags/
- https://goperf.dev/02-networking/efficient-net-use/
- https://goperf.dev/02-networking/gc-endpoint-profiling/