fix(moe): normalize auxiliary loss by top_k for correct load balancing
by Mr-Neutr0n
·
Feb 06, 2026 at 19:25 UTC
·
scan-2f4790caa8e29896
Get this automatically on every PR
Install the Axiomo GitHub App to get Signals as check runs and PR comments on every pull request.
Risk level: Critical (75%)
Normalize auxiliary loss by top_k for correct load balancing in MoE models
Each model file now divides tokens_per_expert by top_k
Normalization changes are consistent across all affected files
Comments indicate correction based on https://github.com/huggingface/transformers/issues/43688
First-time contributor to this repository. maintains 79 public repositories. unfamiliar with 10 files being modified.
Review 22 file(s)
src/transformers/models/dbrx/modeling_dbrx.py
+10
Source code
src/transformers/models/ernie4_5_moe/modeling_ernie4_5_moe.py
+10
Source code
src/transformers/models/ernie4_5_vl_moe/modeling_ernie4_5_vl_moe.py
+10
Source code
src/transformers/models/flex_olmo/modeling_flex_olmo.py
+10
Source code
src/transformers/models/glm4v_moe/modeling_glm4v_moe.py
+10
Source code
src/transformers/models/gpt_oss/modeling_gpt_oss.py
+10
Source code
src/transformers/models/granitemoe/modeling_granitemoe.py
+10
Source code
src/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py
+10
Source code
src/transformers/models/granitemoeshared/modeling_granitemoeshared.py
+10
Source code
src/transformers/models/jamba/modeling_jamba.py
+10
Source code
+12 more files
76
minutes to review
extensive
effort level
none
staleness risk
Schedule dedicated review time; consider pair review
Critical risk level requires changes before approval
Why is ci_passing missing? Consider adding this check.
Why is lint_passing missing? Consider adding this check.
First contribution - consider welcoming and providing extra context