Skip to content

Sirl#1

Open
flowerfox-scitix wants to merge 5 commits into
scitix_mainfrom
sirl
Open

Sirl#1
flowerfox-scitix wants to merge 5 commits into
scitix_mainfrom
sirl

Conversation

@flowerfox-scitix
Copy link
Copy Markdown

No description provided.

flowerfox and others added 5 commits April 16, 2026 15:29
Base: scitix/Megatron-LM core_v0.16.1 (55ac708)
Patch: slime/docker/patch/v0.5.7/megatron.patch (17 files)

Applied cleanly: 11/17 files
Conflicts resolved: 6/17 files
  - transformer_engine.py: kept ours (core_v0.16.1 has quant_context)
  - distrib_optimizer.py: took theirs (skip step key)
  - moe_utils.py: kept ours (core_v0.16.1 native RouterReplay)
  - router.py: kept ours (native RouterReplay, no slime import needed)
  - gpt_layer_specs.py: kept ours (full MLA + non-MLA spec construction)
  - gpt_model.py: kept both padding_mask + mtp_kwargs params,
    kept ours for MTP label handling and loss computation

Key: routing replay is now native in core_v0.16.1 via
megatron.core.transformer.moe.router_replay.RouterReplay.
No external slime/sirl imports needed for this feature.

Co-Authored-By: Claude <noreply@anthropic.com>
Core v0.16.1 has native RouterReplay, but sirl's actor.py fills
replay buffers via sirl.utils.routing_replay.RoutingReplay instances.
This hook registers with sirl's adapter so both systems stay in sync.

Co-Authored-By: Claude <noreply@anthropic.com>
…ay guard

Co-Authored-By: Claude <noreply@anthropic.com>
TransformerConfig fields (normalization, apply_residual_connection_post_layernorm,
post_self_attn_layernorm, post_mlp_layernorm) are auto-registered by
ArgumentGroupFactory. Remove manual add_argument to avoid argparse conflict.

Keep only args NOT in TransformerConfig: norm-epsilon, apply-layernorm-1p,
use-gated-attention.

Co-Authored-By: Claude <noreply@anthropic.com>
ArgumentGroupFactory(TransformerConfig) auto-registers: normalization,
norm-epsilon, apply-layernorm-1p, apply-residual-connection-post-layernorm,
post-self-attn-layernorm, post-mlp-layernorm. All removed from manual
add_argument. Only --use-gated-attention remains (not in TransformerConfig).

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants