Skip to content

Phase 4 — parallel Jacobi sweeps across NeuronCores #29

@scttfrdmn

Description

@scttfrdmn

Roadmap phase tracker

This issue tracks trnsolver's work on Phase 4 of the trnsci roadmap.

See the suite-level roadmap
for the full phase matrix and cross-project dependencies. A reader-friendly
version lives at trnsci.dev/roadmap/.

What this phase means for trnsolver:
Jacobi rotations on disjoint pairs commute, so a sweep is
embarrassingly parallel. Phase 4 exploits that for large-n eigh.

Done means:

  • Multi-chip Jacobi: disjoint rotation pairs assigned per NeuronCore, sweep convergence check via all-reduce.
  • Large SCF eigh (n > 4096) runs on trn1.32xlarge within memory.
  • Benchmarks at trnsci.dev/trnsolver/benchmarks/.
  • Parallel CG / GMRES with sharded preconditioners (depends on Phase 3).

Coordination

  • Label: phase-4-multichip — matches the same label in every sub-project.
  • Cross-project dependencies, if any, are called out in the "Done means"
    list above. Link any PRs / child issues here for tracking.

Close this issue when all "Done means" items are satisfied.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions