Improve Neva performance using goperf.dev as a reference

# Go Optimization Guide Map for Neva

This issue uses [goperf.dev](https://goperf.dev/) as an external reference and maps its performance themes to the Neva codebase.

The goal is **not** to blindly optimize everything. The goal is to identify the places where the guide's ideas are the best fit for Neva's compiler, runtime, CLI, and standard library, and to track related work in one place.

## Status as of March 22, 2026

This issue should be treated as an **umbrella performance map**, not as a flat todo list.

Some of the highest-priority ideas here already have concrete in-flight implementation work:

- [#1004 Redesign runtime `Msg`](https://github.com/nevalang/neva/pull/1004)
  - Directly targets the `runtime.Msg` boxing / message-overhead track.
  - Replaces the old interface-based runtime message representation with a tagged-union design.
  - Adds focused runtime microbenchmarks for message operations.
- [#1023 Add runtime benchmark baseline before Msg redesign](https://github.com/nevalang/neva/pull/1023)
  - Adds the benchmark baseline needed to compare `#1004` and future runtime changes.
  - Expands benchmarking beyond the old single `benchmarks/message_passing` case into a broader e2e suite under `benchmarks/runtime_bench/**`.
- [#996 Proposal: Optimize Message representation via native Go types](https://github.com/nevalang/neva/issues/996)
  - Still relevant as a likely follow-up direction after `#1004`, especially for native composite storage / fast paths.
- [#1030](https://github.com/nevalang/neva/pull/1030)
  - Related correctness work for JSON formatting in the same serialization area touched by `#1004`.

Important status note: as of **March 22, 2026**, neither `#1004` nor `#1023` is merged into `main`, so this issue should distinguish between:

- work already prototyped in open PRs
- work still missing from the repository default branch

## Reading the map

- **Fit** = how relevant the pattern is to Neva today.
- **Priority** = suggested order for investigation.
- **Candidate** = a concrete improvement idea for this repository.
- **Status** = whether the work is already in `main`, being prototyped in an open PR, or still backlog.

## 1. Common Go performance patterns mapped to Neva

| goperf topic | Neva fit | Priority | Status | Candidate for Neva |
| --- | --- | --- | --- | --- |
| [Avoiding Interface Boxing](https://goperf.dev/01-common-patterns/interface-boxing/) | Very high | P0 | In progress via [#1004](https://github.com/nevalang/neva/pull/1004) | Rework `runtime.Msg` hot paths, then remeasure and decide whether Neva still needs specialized scalar fast paths, slimmer stream-item representations, or native-composite follow-ups from [#996](https://github.com/nevalang/neva/issues/996). |
| [Memory Efficiency and Go's Garbage Collector](https://goperf.dev/01-common-patterns/gc/) | Very high | P0 | Partially covered by [#1004](https://github.com/nevalang/neva/pull/1004) and [#1023](https://github.com/nevalang/neva/pull/1023) | Use the new microbench/e2e baselines to review per-message wrappers, repeated `StructMsg` construction/copying, and `StructMsg.Get` scans before introducing pools or unsafe tricks. |
| [Stack Allocations and Escape Analysis](https://goperf.dev/01-common-patterns/stack-alloc/) | Very high | P0 | Not yet done | Run `-gcflags=all=-m=2` over runtime/backend packages on top of the benchmark baseline from [#1023](https://github.com/nevalang/neva/pull/1023) and the runtime changes from [#1004](https://github.com/nevalang/neva/pull/1004). |
| [Memory Preallocation](https://goperf.dev/01-common-patterns/mem-prealloc/) | High | P0 | Backlog | Extend existing preallocation discipline to runtime helpers such as `stream_zip_many`, array-port helpers, and suspicious `len(chans)^2` capacity calculations. |
| [Zero-Copy Techniques](https://goperf.dev/01-common-patterns/zero-copy/) | High | P1 | Backlog | Audit stdlib/runtime I/O. `http_get` still eagerly reads the full response body, and transport semantics around `bytes` vs `string` should stay explicit. |
| [Goroutine Worker Pools](https://goperf.dev/01-common-patterns/worker-pool/) | High | P1 | Backlog | Review fan-out/fan-in helpers that spawn goroutines per slot or per iteration (`ReceiveAll`, `SendAll`, `struct_builder`, `stream_zip_many`, runtime call orchestration). Neva's model is intentionally parallel, so the goal is bounded concurrency for high-cardinality helpers, not “less concurrency everywhere”. |
| [Batching Operations](https://goperf.dev/01-common-patterns/batching-ops/) | High | P1 | Backlog | Add batched variants or internal batching only where benchmarks show per-message overhead dominates useful work. |
| [Immutable Data Sharing](https://goperf.dev/01-common-patterns/immutable-data/) | High | P1 | Backlog | Document and enforce where slices/maps inside messages are treated as immutable payloads versus defensively copied payloads. This is especially relevant for transport-oriented payloads and composite messages. |
| [Atomic Operations and Synchronization Primitives](https://goperf.dev/01-common-patterns/atomic-ops/) | Medium | P2 | Backlog | Measure whether the global send-order counter becomes a contention point under fan-out heavy benchmarks and whether ordering can be made optional on paths that never call `Select`. |
| [Efficient Context Management](https://goperf.dev/01-common-patterns/context/) | Medium | P2 | Backlog | The runtime creates/cancels contexts in `Call`, and stdlib network code still does not thread request-scoped context into `http_get`. |
| [Efficient Buffering](https://goperf.dev/01-common-patterns/buffered-io/) | Medium | P2 | Backlog | Most compiler codegen already uses `bytes.Buffer` / `strings.Builder`. Higher-value follow-up is in stdlib I/O and transport-facing components. |
| [Struct Field Alignment](https://goperf.dev/01-common-patterns/field-alignment/) | Medium | P2 | Backlog | Run a focused field-alignment pass on runtime/compiler structs and compare against the existing `betteralign` workflow from `Makefile`. |
| [Object Pooling](https://goperf.dev/01-common-patterns/object-pooling/) | Medium | P2 | Backlog | Benchmark whether runtime hot paths allocate enough temporary slices/messages to justify scratch-buffer pooling. Avoid applying pools to compiler codegen before measurement. |
| [Lazy Initialization](https://goperf.dev/01-common-patterns/lazy-init/) | Medium | P3 | Backlog | Cache/lazily initialize cold shared state only if startup profiling shows it matters. |
| [Leveraging Compiler Optimization Flags](https://goperf.dev/01-common-patterns/comp-flags/) | Already covered well | P3 | Mostly in `main` | Keep current release defaults; possible follow-up is a dedicated debug profile with `-gcflags="all=-N -l"`. |

## 2. Networking and diagnostics topics: which ones matter to Neva

Most of goperf's networking section targets long-lived Go services. Neva is primarily a compiler/runtime toolchain, so these topics are secondary, but not irrelevant.

| goperf topic | Relevance to Neva | Status / candidate |
| --- | --- | --- |
| [Benchmarking and Load Testing for Networked Go Apps](https://goperf.dev/02-networking/benchmarking-load-testing/) | High as methodology, even outside networking | [#1023](https://github.com/nevalang/neva/pull/1023) is the first concrete step here. Follow-up: land it, document how results are tracked, and pair it with lower-level runtime benchmarks from [#1004](https://github.com/nevalang/neva/pull/1004). |
| [Practical Example: Profiling Networked Go Applications with pprof](https://goperf.dev/02-networking/gc-endpoint-profiling/) | High as diagnostics reference | Add a short perf-playbook doc for Neva: `go test -bench`, `-benchmem`, CPU profiles, alloc profiles, and escape-analysis reports on runtime/compiler packages. |
| [Efficient Use of net/http, net.Conn, and UDP](https://goperf.dev/02-networking/efficient-net-use/) | Medium | Refactor `http_get` to use a reusable `http.Client`, explicit timeouts, request context propagation, and possibly streaming variants. |
| [Managing 10K+ Concurrent Connections](https://goperf.dev/02-networking/10k-connections/) | Low today | Keep as background reading unless Neva grows daemonized services, language servers, or remote execution endpoints inside this repo. |
| Scheduler / epoll / TLS / DNS / connection lifecycle pages | Low today | Background reading only for future infrastructure work. |

## 3. What Neva already does well

Several goperf ideas are already visible either in `main` or in active related PRs:

- The compiler already preallocates maps/slices in analysis and code generation paths where sizes are known ahead of time.
- `main` already has a benchmark entry point in `benchmarks/message_passing`, and [#1023](https://github.com/nevalang/neva/pull/1023) expands that into a broader runtime benchmark suite.
- Release builds already use size-oriented linker flags and reproducibility-friendly build flags.
- Runtime code already uses `strings.Builder`, `bytes.Buffer`, and atomics in a few targeted places instead of overusing them everywhere.

That means the highest-value work is **not** “introduce basic Go performance hygiene”. The highest-value work is to put numbers around the runtime's message-passing cost model and then use those numbers to drive the next runtime changes.

## 4. Recommended sequence for Neva

### P0 — measure and reduce message overhead

1. Land and rebase the benchmark/message work already in flight:
   - [#1023](https://github.com/nevalang/neva/pull/1023) for e2e runtime benchmark baselines.
   - [#1004](https://github.com/nevalang/neva/pull/1004) for runtime `Msg` redesign and lower-level runtime benchmarks.
2. Run `go test -bench=. -benchmem` on the touched runtime packages and benchmark suites.
3. Run escape-analysis reports (`-gcflags=all=-m=2`) for runtime/backend packages.
4. Use the results to decide whether boxing, copying, or goroutine churn is the dominant cost.

### P1 — fix hot runtime patterns

1. Audit goroutine-per-slot helpers and compare them with bounded worker patterns.
2. Audit byte/string conversions and whole-buffer I/O APIs.
3. Introduce batching only where benchmarks show that message granularity dominates useful work.

### P2 — smaller structural wins

1. Field-alignment pass on selected structs.
2. Ordering/atomic contention review for the global runtime counter.
3. Context propagation and timeout cleanup for stdlib HTTP.

### P3 — polish and tooling

1. Add a documented perf-playbook / debug profile.
2. Cache/lazily initialize cold shared state only if startup profiling shows it matters.
3. Keep this issue updated as concrete benchmark and runtime work lands.

## 5. Suggested first issues / PRs

1. **Land and rebase the benchmark/message work already in flight.**
   - [#1023](https://github.com/nevalang/neva/pull/1023)
   - [#1004](https://github.com/nevalang/neva/pull/1004)
2. **Add a perf-playbook doc** (`bench`, `benchmem`, `pprof`, escape analysis) that explains how to use the new benchmarks and how to compare results before/after runtime changes.
3. **Run escape-analysis and allocation review on top of `#1004` / `#1023`.**
4. **Review goroutine-per-slot helpers for bounded-concurrency alternatives.**
5. **Refactor `http_get` to use context-aware requests and a reusable client**, while keeping transport semantics explicit (`bytes` vs `string`) and aligning with the current stdlib direction.

## Source reference

Primary source: [goperf.dev](https://goperf.dev/)

Key reference pages used while preparing this map:

- <https://goperf.dev/>
- <https://goperf.dev/01-common-patterns/interface-boxing/>
- <https://goperf.dev/01-common-patterns/stack-alloc/>
- <https://goperf.dev/01-common-patterns/atomic-ops/>
- <https://goperf.dev/01-common-patterns/buffered-io/>
- <https://goperf.dev/01-common-patterns/comp-flags/>
- <https://goperf.dev/02-networking/efficient-net-use/>
- <https://goperf.dev/02-networking/gc-endpoint-profiling/>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Neva performance using goperf.dev as a reference #1067

Go Optimization Guide Map for Neva

Status as of March 22, 2026

Reading the map

1. Common Go performance patterns mapped to Neva

2. Networking and diagnostics topics: which ones matter to Neva

3. What Neva already does well

4. Recommended sequence for Neva

P0 — measure and reduce message overhead

P1 — fix hot runtime patterns

P2 — smaller structural wins

P3 — polish and tooling

5. Suggested first issues / PRs

Source reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

goperf topic	Neva fit	Priority	Status	Candidate for Neva
Avoiding Interface Boxing	Very high	P0	In progress via #1004	Rework `runtime.Msg` hot paths, then remeasure and decide whether Neva still needs specialized scalar fast paths, slimmer stream-item representations, or native-composite follow-ups from #996.
Memory Efficiency and Go's Garbage Collector	Very high	P0	Partially covered by #1004 and #1023	Use the new microbench/e2e baselines to review per-message wrappers, repeated `StructMsg` construction/copying, and `StructMsg.Get` scans before introducing pools or unsafe tricks.
Stack Allocations and Escape Analysis	Very high	P0	Not yet done	Run `-gcflags=all=-m=2` over runtime/backend packages on top of the benchmark baseline from #1023 and the runtime changes from #1004.
Memory Preallocation	High	P0	Backlog	Extend existing preallocation discipline to runtime helpers such as `stream_zip_many`, array-port helpers, and suspicious `len(chans)^2` capacity calculations.
Zero-Copy Techniques	High	P1	Backlog	Audit stdlib/runtime I/O. `http_get` still eagerly reads the full response body, and transport semantics around `bytes` vs `string` should stay explicit.
Goroutine Worker Pools	High	P1	Backlog	Review fan-out/fan-in helpers that spawn goroutines per slot or per iteration (`ReceiveAll`, `SendAll`, `struct_builder`, `stream_zip_many`, runtime call orchestration). Neva's model is intentionally parallel, so the goal is bounded concurrency for high-cardinality helpers, not “less concurrency everywhere”.
Batching Operations	High	P1	Backlog	Add batched variants or internal batching only where benchmarks show per-message overhead dominates useful work.
Immutable Data Sharing	High	P1	Backlog	Document and enforce where slices/maps inside messages are treated as immutable payloads versus defensively copied payloads. This is especially relevant for transport-oriented payloads and composite messages.
Atomic Operations and Synchronization Primitives	Medium	P2	Backlog	Measure whether the global send-order counter becomes a contention point under fan-out heavy benchmarks and whether ordering can be made optional on paths that never call `Select`.
Efficient Context Management	Medium	P2	Backlog	The runtime creates/cancels contexts in `Call`, and stdlib network code still does not thread request-scoped context into `http_get`.
Efficient Buffering	Medium	P2	Backlog	Most compiler codegen already uses `bytes.Buffer` / `strings.Builder`. Higher-value follow-up is in stdlib I/O and transport-facing components.
Struct Field Alignment	Medium	P2	Backlog	Run a focused field-alignment pass on runtime/compiler structs and compare against the existing `betteralign` workflow from `Makefile`.
Object Pooling	Medium	P2	Backlog	Benchmark whether runtime hot paths allocate enough temporary slices/messages to justify scratch-buffer pooling. Avoid applying pools to compiler codegen before measurement.
Lazy Initialization	Medium	P3	Backlog	Cache/lazily initialize cold shared state only if startup profiling shows it matters.
Leveraging Compiler Optimization Flags	Already covered well	P3	Mostly in `main`	Keep current release defaults; possible follow-up is a dedicated debug profile with `-gcflags="all=-N -l"`.

goperf topic	Relevance to Neva	Status / candidate
Benchmarking and Load Testing for Networked Go Apps	High as methodology, even outside networking	#1023 is the first concrete step here. Follow-up: land it, document how results are tracked, and pair it with lower-level runtime benchmarks from #1004.
Practical Example: Profiling Networked Go Applications with pprof	High as diagnostics reference	Add a short perf-playbook doc for Neva: `go test -bench`, `-benchmem`, CPU profiles, alloc profiles, and escape-analysis reports on runtime/compiler packages.
Efficient Use of net/http, net.Conn, and UDP	Medium	Refactor `http_get` to use a reusable `http.Client`, explicit timeouts, request context propagation, and possibly streaming variants.
Managing 10K+ Concurrent Connections	Low today	Keep as background reading unless Neva grows daemonized services, language servers, or remote execution endpoints inside this repo.
Scheduler / epoll / TLS / DNS / connection lifecycle pages	Low today	Background reading only for future infrastructure work.

Improve Neva performance using goperf.dev as a reference #1067

Description

Go Optimization Guide Map for Neva

Status as of March 22, 2026

Reading the map

1. Common Go performance patterns mapped to Neva

2. Networking and diagnostics topics: which ones matter to Neva

3. What Neva already does well

4. Recommended sequence for Neva

P0 — measure and reduce message overhead

P1 — fix hot runtime patterns

P2 — smaller structural wins

P3 — polish and tooling

5. Suggested first issues / PRs

Source reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions