Back

transformers #43771

fix: Add MXFP4 MoE/attention backward kernels

by leoneperdigao · Feb 06, 2026 at 19:26 UTC · scan-f05078364281ec84

High Risk (50%)

Get this automatically on every PR

Install the Axiomo GitHub App to get Signals as check runs and PR comments on every pull request.

Install App

Risk Assessment

Risk level: High (50%)

Risk Drivers

  • large_diff: Large change: 1529 lines modified
  • api_surface_change: API surface changed in 5 file(s)

Intent

5/5 criteria met

Implement backward pass support for MXFP4 quantized weights to enable LoRA/adapter fine-tuning.

Non-Goals

  • - Support for weight gradients (dW) - quantized weights must remain frozen.
  • - Enabling full fine-tuning - only LoRA/adapter fine-tuning is supported.
  • - Replacing existing forward kernels.

Acceptance Criteria

  • Implement SwiGLU Backward Kernel

    New file src/transformers/integrations/mxfp4_backward.py added for Triton kernel implementation.

  • MatmulOGS Autograd Wrapper implemented

    Transformer integration updated with MatmulOGSFunction in mxfp4_backward module.

  • MoE Routing Gradient Inversion handled

    MoE routing gradient handled in mxfp4_backward.py with gradient inversion logic.

  • Training Mode Integration

    Training mode toggle implemented in mxfp4.py with enable_training_mode method.

  • Tests added/updated and passing

    Comprehensive tests added in tests/quantization/mxfp4/test_mxfp4_backward.py.

Confidence: 95.0% Source: pr description AI: openai

Contributors

leoneperdigao PR Author 5 commits ? Low Trust
Account Age: 4238 days
Prior PRs: 2
Merged: 1

Has 1 merged PRs to this repo. maintains 55 public repositories. unfamiliar with 6 files being modified.

Evidence

Evidence Completeness: 10.0%
tests_passing Failing
Missing: ci_passing, lint_passing, security_scan_clean, coverage_maintained, build_successful

Supply Chain

None Risk
Modifies dependencies
Modifies lockfile
Modifies CI config
Modifies build scripts

Focus Files

Focus on 3 critical file(s)

benchmarks/benchmark_mxfp4_backward.py +219

219 lines changed; New file; Source code

critical
src/transformers/integrations/mxfp4_backward.py +866

866 lines changed; New file; Source code

critical
tests/quantization/mxfp4/test_mxfp4_backward.py +311

311 lines changed; New file; Source code

critical
src/transformers/integrations/mxfp4.py +83

83 lines changed; Source code

high
src/transformers/quantizers/quantizer_mxfp4.py +42

Source code

medium
src/transformers/integrations/__init__.py +8

Source code

medium

Triage

169

minutes to review

high

effort level

none

staleness risk

Allocate focused review time

Recommendation

NEEDS DISCUSSION 34.0% readiness

Insufficient evidence (CI/tests) to evaluate

Next Steps

Concern

Consider breaking into smaller PRs

Question

Why is ci_passing missing? Consider adding this check.

Question

Why is lint_passing missing? Consider adding this check.

Concern benchmarks/benchmark_mxfp4_backward.py

Critical file: 219 lines changed; New file; Source code

Concern src/transformers/integrations/mxfp4_backward.py

Critical file: 866 lines changed; New file; Source code

Concern tests/quantization/mxfp4/test_mxfp4_backward.py

Critical file: 311 lines changed; New file; Source code