Skip to content

NKI compiler emits trn2-only shared-memory on trn1 for fused XLA graph #39

@scttfrdmn

Description

@scttfrdmn

Status: open AWS SDK limitation, not a trntensor bug.

Symptom: When the full DF-MP2 pipeline runs with all operands pre-pinned on XLA —

B_x = trntensor.ao_to_mo_transform(eri_x, C_occ_x, C_vir_x)  # OK
E_x = trntensor.mp2_energy(B_x, eps_occ_x, eps_vir_x)         # FAILS

— the NKI compiler raises:

Shared memory is only supported on trn2, but inst__I-7-0:_mem_0_0_set
is using Shared memory on an unsupported target

The combined XLA lazy graph spanning both kernels triggers a code-gen path that chooses trn2-specific shared memory instructions. On trn1 this fails at the verifier. Our individual kernels compile fine when called in isolation.

What we tried:

Practical effect: Users must currently from_xla B between the two kernel calls (defeats residency for this specific pipeline). test_pipeline_composition is @pytest.mark.skip'd referencing this issue; other residency tests (test_matmul_stays_on_xla, test_residency_speedup) work fine.

Escalation path: per the NKI error message, open an AWS Neuron SDK issue at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. Attach the HLO module dump (set XLA_IR_DEBUG=1 + XLA_HLO_DEBUG=1) from a reproduction.

Linked: #34 (residency, shipped), #35 (mark_step investigation — this now has a concrete case), #38 (eps reshape — closing separately because the reshape is still correct).

Metadata

Metadata

Assignees

No one assigned

    Labels

    infraCI, tooling, deployment, repo hygiene

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions