Back

transformers #43775

fix(moe): normalize auxiliary loss by top_k for correct load balancing

by Mr-Neutr0n · Feb 06, 2026 at 19:25 UTC · scan-2f4790caa8e29896

Critical Risk (75%)

Get this automatically on every PR

Install the Axiomo GitHub App to get Signals as check runs and PR comments on every pull request.

Install App

Risk Assessment

Risk level: Critical (75%)

Risk Drivers

  • high_file_spread: Touches 22 files
  • multiple_concerns: Spans 21 directories
  • new_contributor: First contribution from Mr-Neutr0n
  • missing_tests: Added 154 lines of code but only 0 lines of tests

Intent

3/3 criteria met

Normalize auxiliary loss by top_k for correct load balancing in MoE models

Acceptance Criteria

  • โœ“
    Normalize tokens_per_expert by top_k to ensure sum(f_i) = 1

    Each model file now divides tokens_per_expert by top_k

  • โœ“
    Ensure the normalization matches expected behavior for MoE models

    Normalization changes are consistent across all affected files

  • โœ“
    Correct erroneous load balancing loss calculations

    Comments indicate correction based on https://github.com/huggingface/transformers/issues/43688

Confidence: 95.0% Source: pr description AI: openai

Contributors

Mr-Neutr0n PR Author 3 commits ? New Contributor
Account Age: 2108 days
Prior PRs: 1

First-time contributor to this repository. maintains 79 public repositories. unfamiliar with 10 files being modified.

Evidence

Evidence Completeness: 10.0%
tests_passing Failing
Missing: ci_passing, lint_passing, security_scan_clean, coverage_maintained, build_successful

Supply Chain

None Risk
Modifies dependencies
Modifies lockfile
Modifies CI config
Modifies build scripts

Focus Files

Review 22 file(s)

src/transformers/models/dbrx/modeling_dbrx.py +10

Source code

medium
src/transformers/models/ernie4_5_moe/modeling_ernie4_5_moe.py +10

Source code

medium
src/transformers/models/ernie4_5_vl_moe/modeling_ernie4_5_vl_moe.py +10

Source code

medium
src/transformers/models/flex_olmo/modeling_flex_olmo.py +10

Source code

medium
src/transformers/models/glm4v_moe/modeling_glm4v_moe.py +10

Source code

medium
src/transformers/models/gpt_oss/modeling_gpt_oss.py +10

Source code

medium
src/transformers/models/granitemoe/modeling_granitemoe.py +10

Source code

medium
src/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py +10

Source code

medium
src/transformers/models/granitemoeshared/modeling_granitemoeshared.py +10

Source code

medium
src/transformers/models/jamba/modeling_jamba.py +10

Source code

medium

+12 more files

Triage

76

minutes to review

extensive

effort level

none

staleness risk

Schedule dedicated review time; consider pair review

Recommendation

REQUEST CHANGES 22.0% readiness

Critical risk level requires changes before approval

Next Steps

Question

Why is ci_passing missing? Consider adding this check.

Question

Why is lint_passing missing? Consider adding this check.

Nitpick

First contribution - consider welcoming and providing extra context