Back

transformers #43771

fix: Add MXFP4 MoE/attention backward kernels

by leoneperdigao · Feb 06, 2026 at 19:26 UTC · scan-f05078364281ec84

View on Github · Download JSON

High Risk (50%)

Get this automatically on every PR

Install the Axiomo GitHub App to get Signals as check runs and PR comments on every pull request.

Install App

Risk Assessment

Risk level: High (50%)

Risk Drivers

large_diff: Large change: 1529 lines modified
api_surface_change: API surface changed in 5 file(s)

Intent

5/5 criteria met

Implement backward pass support for MXFP4 quantized weights to enable LoRA/adapter fine-tuning.

Non-Goals

- Support for weight gradients (dW) - quantized weights must remain frozen.
- Enabling full fine-tuning - only LoRA/adapter fine-tuning is supported.
- Replacing existing forward kernels.

Acceptance Criteria

✓
Implement SwiGLU Backward Kernel
New file src/transformers/integrations/mxfp4_backward.py added for Triton kernel implementation.
✓
MatmulOGS Autograd Wrapper implemented
Transformer integration updated with MatmulOGSFunction in mxfp4_backward module.
✓
MoE Routing Gradient Inversion handled
MoE routing gradient handled in mxfp4_backward.py with gradient inversion logic.
✓
Training Mode Integration
Training mode toggle implemented in mxfp4.py with enable_training_mode method.
✓
Tests added/updated and passing
Comprehensive tests added in tests/quantization/mxfp4/test_mxfp4_backward.py.

Confidence: 95.0% Source: pr description AI: openai

Contributors

leoneperdigao PR Author 5 commits ? Low Trust

Account Age: 4238 days

Prior PRs: 2

Merged: 1

Has 1 merged PRs to this repo. maintains 55 public repositories. unfamiliar with 6 files being modified.

Evidence

Evidence Completeness: 10.0%

tests_passing Failing

Missing: ci_passing, lint_passing, security_scan_clean, coverage_maintained, build_successful

Supply Chain

None Risk

Modifies dependencies

Modifies lockfile

Modifies CI config

Modifies build scripts

Focus Files

Focus on 3 critical file(s)

benchmarks/benchmark_mxfp4_backward.py +219

219 lines changed; New file; Source code

critical

src/transformers/integrations/mxfp4_backward.py +866

866 lines changed; New file; Source code

critical

tests/quantization/mxfp4/test_mxfp4_backward.py +311

311 lines changed; New file; Source code

critical

src/transformers/integrations/mxfp4.py +83

83 lines changed; Source code

high

src/transformers/quantizers/quantizer_mxfp4.py +42

Source code

medium

src/transformers/integrations/__init__.py +8

Source code

medium

Triage

169

minutes to review

high

effort level

none

staleness risk

Allocate focused review time

Recommendation

NEEDS DISCUSSION 34.0% readiness

Insufficient evidence (CI/tests) to evaluate

Next Steps

Concern

Consider breaking into smaller PRs

Question

Why is ci_passing missing? Consider adding this check.

Question

Why is lint_passing missing? Consider adding this check.

Concern


                                    benchmarks/benchmark_mxfp4_backward.py

Critical file: 219 lines changed; New file; Source code

Concern


                                    src/transformers/integrations/mxfp4_backward.py

Critical file: 866 lines changed; New file; Source code

Concern


                                    tests/quantization/mxfp4/test_mxfp4_backward.py

Critical file: 311 lines changed; New file; Source code