Latest Results
chore(deps): Update dependency transformers to v4.57.6 (#408)
> ℹ️ **Note**
>
> This PR body was truncated due to platform limits.
This PR contains the following updates:
| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [transformers](https://redirect.github.com/huggingface/transformers) |
`==4.49.0` → `==4.57.6` |

|

|
---
### Release Notes
<details>
<summary>huggingface/transformers (transformers)</summary>
###
[`v4.57.6`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.57.6):
Patch release v4.57.6
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.57.5...v4.57.6)
#### What's Changed
Another fix for qwen vl models that prevented correctly loading the
associated model type - this works together with
[#​41808](https://redirect.github.com/huggingface/transformers/pull/41808)
of the previous patch release.
- Fixed incorrect model\_type for qwen2vl and qwen2.5vl when config is
saved and loaded again by
[@​i3hz](https://redirect.github.com/i3hz) in
[#​41758](https://redirect.github.com/huggingface/transformers/pull/41758)
**Full Changelog**:
<https://github.com/huggingface/transformers/compare/v4.57.5...v4.57.6>
###
[`v4.57.5`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.57.5):
Patch release v4.57.5
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.57.4...v4.57.5)
#### What's Changed
Should not have said last patch :wink: These should be the last
remaining fixes that got lost in between patches and the transition to
v5.
- QwenVL: add skipped keys in setattr as well by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​41808](https://redirect.github.com/huggingface/transformers/pull/41808)
- Fix lr\_scheduler\_parsing by
[@​SunMarc](https://redirect.github.com/SunMarc) in
[#​41322](https://redirect.github.com/huggingface/transformers/pull/41322)
**Full Changelog**:
<https://github.com/huggingface/transformers/compare/v4.57.4...v4.57.5>
###
[`v4.57.4`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.57.4):
Patch release v4.57.4
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.57.3...v4.57.4)
#### What's Changed
Last patch release for v4: We have a few small fixes for remote
generation methods (e.g. group beam search), vLLM, and an offline
tokenizer fix (if it's already been cached).
- Grouped beam search from config params by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​42472](https://redirect.github.com/huggingface/transformers/pull/42472)
- Handle decorator with optional arguments better
[@​hmellor](https://redirect.github.com/hmellor) in
[#​42512](https://redirect.github.com/huggingface/transformers/pull/42512)
- fix: make mistral base check conditional to fix offline loading by
[@​Killusions](https://redirect.github.com/Killusions) in
[#​42880](https://redirect.github.com/huggingface/transformers/pull/42880)
#### New Contributors
- [@​Killusions](https://redirect.github.com/Killusions) made
their first contribution in
[#​42880](https://redirect.github.com/huggingface/transformers/pull/42880)
**Full Changelog**:
<https://github.com/huggingface/transformers/compare/v4.57.3...v4.57.4>
###
[`v4.57.3`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.57.3):
Patch release v4.57.3
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.57.2...v4.57.3)
There was a hidden bug when loading models with `local_files_only=True`
and a typo related to the recent patch.
The main fix is:
[`b605555`](https://redirect.github.com/huggingface/transformers/commit/b6055550a15a8fab367cf983b743ff68cc58d81a).
We are really sorry that this slipped through, our CIs just did not
catch it.
As it affects a lot of users we are gonna yank the previous release
###
[`v4.57.2`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.57.2):
Patch Release v4.57.2
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.57.1...v4.57.2)
This patch most notably fixes an issue on some Mistral tokenizers. It
contains the following commits:
- Add AutoTokenizer mapping for mistral3 and ministral
([#​42198](https://redirect.github.com/huggingface/transformers/issues/42198))
- Auto convert tekken.json
([#​42299](https://redirect.github.com/huggingface/transformers/issues/42299))
- fix tekken pattern matching
([#​42363](https://redirect.github.com/huggingface/transformers/issues/42363))
- Check model inputs - hidden states
([#​40994](https://redirect.github.com/huggingface/transformers/issues/40994))
- Remove invalid `@staticmethod` from module-level
get\_device\_and\_memory\_breakdown
([#​41747](https://redirect.github.com/huggingface/transformers/issues/41747))
###
[`v4.57.1`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.57.1):
Patch release v4.57.1
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.57.0...v4.57.1)
This patch most notably fixes an issue with an optional dependency
(`optax`), which resulted in parsing errors with `poetry`. It contains
the following fixes:
- [fix optax dep
issue](https://redirect.github.com/huggingface/transformers/commit/0645c9ec3188e000aecf5060e2cdabcc156bb794)
- [remove offload\_state\_dict from
kwargs](https://redirect.github.com/huggingface/transformers/commit/a92b1e8a45e1863b95c5e2caa12f5597aee80279)
- Fix bnb fsdp loading for pre-quantized checkpoint
([#​41415](https://redirect.github.com/huggingface/transformers/issues/41415))
- Fix tests fsdp
([#​41422](https://redirect.github.com/huggingface/transformers/issues/41422))
- Fix trainer for py3.9
([#​41359](https://redirect.github.com/huggingface/transformers/issues/41359))
###
[`v4.57.0`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.57.0):
: Qwen3-Next, Vault Gemma, Qwen3 VL, LongCat Flash, Flex OLMO, LFM2 VL,
BLT, Qwen3 OMNI MoE, Parakeet, EdgeTAM, OLMO3
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.56.2...v4.57.0)
#### New model additions
##### Qwen3 Next
<img width="1200" height="511" alt="image"
src="https://github.com/user-attachments/assets/3abad6c4-5650-412d-a831-f8a30a5d962e"
/>
The Qwen3-Next series represents the Qwen team's next-generation
foundation models, optimized for extreme context length and large-scale
parameter efficiency.
The series introduces a suite of architectural innovations designed to
maximize performance while minimizing computational cost:
- **Hybrid Attention**: Replaces standard attention with the combination
of **Gated DeltaNet** and **Gated Attention**, enabling efficient
context modeling.
- **High-Sparsity MoE**: Achieves an extreme low activation ratio as
1:50 in MoE layers — drastically reducing FLOPs per token while
preserving model capacity.
- **Multi-Token Prediction(MTP)**: Boosts pretraining model performance,
and accelerates inference.
- **Other Optimizations**: Includes techniques such as **zero-centered
and weight-decayed layernorm**, **Gated Attention**, and other
stabilizing enhancements for robust training.
Built on this architecture, they trained and open-sourced
Qwen3-Next-80B-A3B — 80B total parameters, only 3B active — achieving
extreme sparsity and efficiency.
Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream
tasks — while requiring **less than 1/10 of the training cost**.
Moreover, it delivers over **10x higher inference throughput** than
Qwen3-32B when handling contexts longer than 32K tokens.
For more details, please visit their blog [Qwen3-Next](qwen3_next)
([blog post](https://qwenlm.github.io/blog/qwen3_next/)).
- Adding Support for Qwen3-Next by
[@​bozheng-hit](https://redirect.github.com/bozheng-hit) in
[#​40771](https://redirect.github.com/huggingface/transformers/issues/40771)
##### Vault Gemma
<img width="1282" height="392" alt="image"
src="https://github.com/user-attachments/assets/9412905b-4083-4994-9000-aa0dbf97eb6f"
/>
[VaultGemma](https://services.google.com/fh/files/blogs/vaultgemma_tech_report.pdf)
is a text-only decoder model derived from [Gemma
2](https://huggingface.co/docs/transformers/en/model_doc/gemma2),
notably it drops the norms after the Attention and MLP blocks, and uses
full attention for all layers instead of alternating between full
attention and local sliding attention. VaultGemma is available as a
pretrained model with 1B parameters that uses a 1024 token sequence
length.
VaultGemma was trained from scratch with sequence-level differential
privacy (DP). Its training data includes the same mixture as the [Gemma
2
models](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315),
consisting of a number of documents of varying lengths. Additionally, it
is trained using [DP stochastic gradient descent
(DP-SGD)](https://arxiv.org/abs/1607.00133) and provides a (ε ≤ 2.0, δ ≤
1.1e-10)-sequence-level DP guarantee, where a sequence consists of 1024
consecutive tokens extracted from heterogeneous data sources.
Specifically, the privacy unit of the guarantee is for the sequences
after sampling and packing of the mixture.
- add: differential privacy research model by
[@​RyanMullins](https://redirect.github.com/RyanMullins) in
[#​40851](https://redirect.github.com/huggingface/transformers/issues/40851)
##### Qwen3 VL
<img width="3544" height="1886" alt="image"
src="https://github.com/user-attachments/assets/5afa70cb-506e-4d56-baa3-30e7522ac653"
/>
[Qwen3-VL](https://huggingface.co/papers/2502.13923) is a multimodal
vision-language model series, encompassing both dense and MoE variants,
as well as Instruct and Thinking versions.
Building upon its predecessors, Qwen3-VL delivers significant
improvements in visual understanding while maintaining strong pure text
capabilities. Key architectural advancements include: enhanced MRope
with interleaved layout for better spatial-temporal modeling, DeepStack
integration to effectively leverage multi-level features from the Vision
Transformer (ViT), and improved video understanding through text-based
time alignment—evolving from T-RoPE to text timestamp alignment for more
precise temporal grounding.
These innovations collectively enable Qwen3-VL to achieve superior
performance in complex multimodal tasks.
- Adding Support for Qwen3-VL Series by
[@​JJJYmmm](https://redirect.github.com/JJJYmmm) in
[#​40795](https://redirect.github.com/huggingface/transformers/issues/40795)
##### Longcat Flash
<img width="763" height="468" alt="image"
src="https://github.com/user-attachments/assets/289d33e0-6c71-458d-ae07-b7d454ac2adf"
/>
The LongCatFlash model was proposed in [LongCat-Flash Technical
Report](https://huggingface.co/papers/2509.01322) by the Meituan LongCat
Team. LongCat-Flash is a 560B parameter Mixture-of-Experts (MoE) model
that activates 18.6B-31.3B parameters dynamically (average \~27B). The
model features a shortcut-connected architecture enabling high inference
speed (>100 tokens/second) and advanced reasoning capabilities.
The abstract from the paper is the following:
*We present LongCat-Flash, a 560 billion parameter Mixture-of-Experts
(MoE) language model featuring a dynamic computation mechanism that
activates 18.6B-31.3B parameters based on context (average \~27B). The
model incorporates a shortcut-connected architecture enabling high
inference speed (>100 tokens/second) and demonstrates strong performance
across multiple benchmarks including 89.71% accuracy on MMLU and
exceptional agentic tool use capabilities.*
Tips:
- LongCat-Flash uses a unique shortcut-connected MoE architecture that
enables faster inference compared to traditional MoE models
- The model supports up to 128k context length for long-form tasks
- Dynamic parameter activation makes it computationally efficient while
maintaining high performance
- Best suited for applications requiring strong reasoning, coding, and
tool-calling capabilities
- The MoE architecture includes zero experts (nn.Identity modules) which
act as skip connections, allowing tokens to bypass expert computation
when appropriate
* Add LongCat-Flash by
[@​molbap](https://redirect.github.com/molbap) in
[#​40730](https://redirect.github.com/huggingface/transformers/issues/40730)
##### Flex Olmo
<img width="700" height="414" alt="image"
src="https://github.com/user-attachments/assets/7b92ee0f-5f5a-459c-ad4d-e01b5c10202e"
/>
[FlexOlmo](https://huggingface.co/papers/2507.07024) is a new class of
language models (LMs) that supports (1) distributed training without
data sharing, where different model parameters are independently trained
on closed datasets, and (2) data-flexible inference, where these
parameters along with their associated data can be flexibly included or
excluded from model inferences with no further training. FlexOlmo
employs a mixture-of-experts (MoE) architecture where each expert is
trained independently on closed datasets and later integrated through a
new domain-informed routing without any joint training. FlexOlmo is
trained on FlexMix, a corpus we curate comprising publicly available
datasets alongside seven domain-specific sets, representing realistic
approximations of closed sets.
You can find all the original FlexOlmo checkpoints under the
[FlexOlmo](https://huggingface.co/collections/allenai/flexolmo-68471177a386b6e20a54c55f)
collection.
- Add FlexOlmo model by
[@​2015aroras](https://redirect.github.com/2015aroras) in
[#​40921](https://redirect.github.com/huggingface/transformers/issues/40921)
##### LFM2 VL
<img width="2300" height="1400" alt="image"
src="https://github.com/user-attachments/assets/ef0605cd-9512-458c-915a-62316e14d90c"
/>
[LFM2-VL](https://www.liquid.ai/blog/lfm2-vl-efficient-vision-language-models)
first series of vision-language foundation models developed by [Liquid
AI](https://liquid.ai/). These multimodal models are designed for
low-latency and device-aware deployment. LFM2-VL extends the LFM2 family
of open-weight Liquid Foundation Models (LFMs) into the vision-language
space, supporting both text and image inputs with variable resolutions.
##### Architecture
LFM2-VL consists of three main components: a language model backbone, a
vision encoder, and a multimodal projector. LFM2-VL builds upon the LFM2
backbone, inheriting from either LFM2-1.2B (for LFM2-VL-1.6B) or
LFM2-350M (for LFM2-VL-450M). For the vision tower, LFM2-VL uses SigLIP2
NaFlex encoders to convert input images into token sequences. Two
variants are implemented:
- Shape-optimized (400M) for more fine-grained vision capabilities for
LFM2-VL-1.6B
- Base (86M) for fast image processing for LFM2-VL-450M
The encoder processes images at their native resolution up to 512×512
pixels, efficiently handling smaller images without upscaling and
supporting non-standard aspect ratios without distortion. Larger images
are split into non-overlapping square patches of 512×512 each,
preserving detail. In LFM2-VL-1.6B, the model also receives a thumbnail
(a small, downscaled version of the original image capturing the overall
scene) to enhance global context understanding and alignment. Special
tokens mark each patch’s position and indicate the thumbnail’s start.
The multimodal connector is a 2-layer MLP connector with pixel unshuffle
to reduce image token count.
- Add new model LFM2-VL by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40624](https://redirect.github.com/huggingface/transformers/issues/40624)
##### BLT
<img width="1448" height="1062" alt="image"
src="https://github.com/user-attachments/assets/af1fbb09-082c-4331-9217-357adb506cbf"
/>
The BLT model was proposed in [Byte Latent Transformer: Patches Scale
Better Than Tokens](https://arxiv.org/pdf/2412.09871) by Artidoro
Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Muller,
Margaret Li1, Chunting Zhou, Lili Yu, Jason Weston, Luke Zettlemoyer,
Gargi Ghosh, Mike Lewis, Ari Holtzman†, Srinivasan Iyer.
BLT is a byte-level LLM that achieves tokenization-level performance
through entropy-based dynamic patching.
The abstract from the paper is the following:
*We introduce the Byte Latent Transformer (BLT), a new byte-level LLM
architecture that, for the first time, matches tokenization-based LLM
performance at scale with significant improvements in inference
efficiency and robustness. BLT encodes bytes into dynamically sized
patches, which serve as the primary units of computation. Patches are
segmented based on the entropy of the next byte, allocating
more compute and model capacity where increased data complexity demands
it. We present the first flop controlled scaling study of byte-level
models up to 8B parameters and 4T training bytes. Our results
demonstrate the feasibility of scaling models trained on raw bytes
without a fixed vocabulary. Both training and inference efficiency
improve due to dynamically selecting long patches when data is
predictable, along with qualitative improvements on reasoning and long
tail generalization. Overall, for fixed inference costs, BLT shows
significantly better scaling than tokenization-based models, by
simultaneously growing both patch and model size.*
##### Usage Tips:
- **Dual Model Architecture**: BLT consists of two separate trained
models:
- **Patcher (Entropy Model)**: A smaller transformer model that predicts
byte-level entropy to determine patch boundaries and segment input.
- **Main Transformer Model**: The primary model that processes the
patches through a Local Encoder, Global Transformer, and Local Decoder.
- **Dynamic Patching**: The model uses entropy-based dynamic patching
where:
- High-entropy regions (complex data) get shorter patches with more
computational attention
- Low-entropy regions (predictable data) get longer patches for
efficiency
- This allows the model to allocate compute resources where they're most
needed
- **Local Encoder**: Processes byte sequences with cross-attention to
patch embeddings
- **Global Transformer**: Processes patch-level representations with
full attention across patches
- **Local Decoder**: Generates output with cross-attention back to the
original byte sequence
- **Byte-Level Tokenizer**: Unlike traditional tokenizers that use
learned vocabularies, BLT's tokenizer simply converts text to UTF-8
bytes and maps each byte to a token ID. There is no need for a
vocabulary.
* blt wip by [@​itazap](https://redirect.github.com/itazap) in
[#​38579](https://redirect.github.com/huggingface/transformers/issues/38579)
##### Qwen3 Omni MoE
<img width="14084" height="7429" alt="image"
src="https://github.com/user-attachments/assets/20d46a43-15f2-42bf-9703-9575f5ca4430"
/>
The [Qwen2.5-Omni](https://qwenlm.github.io/blog/qwen2.5-omni/) model is
a unified multiple modalities model proposed in [Qwen2.5-Omni Technical
Report](https://huggingface.co/papers/2503.20215) from Qwen team,
Alibaba Group.
##### Notes
- Use \[`Qwen2_5OmniForConditionalGeneration`] to generate audio and
text output. To generate only one output type, use
\[`Qwen2_5OmniThinkerForConditionalGeneration`] for text-only and
\[`Qwen2_5OmniTalkersForConditionalGeneration`] for audio-only outputs.
- Audio generation with \[`Qwen2_5OmniForConditionalGeneration`]
supports only single batch size at the moment.
- In case out out-of-memory errors hwen working with video input,
decrease `processor.max_pixels`. By default the maximum is set to a very
arge value and high resolution visuals will not be resized, unless
resolution exceeds `processor.max_pixels`.
- The processor has its own \[`~ProcessorMixin.apply_chat_template`]
method to convert chat messages to model inputs.
* Adding support for Qwen3Omni by
[@​BakerBunker](https://redirect.github.com/BakerBunker) in
[#​41025](https://redirect.github.com/huggingface/transformers/issues/41025)
##### Parakeet
<img width="1431" height="527" alt="image"
src="https://github.com/user-attachments/assets/e831f451-9be3-4b5c-a222-b833a50ceb2a"
/>
Parakeet models, [introduced by NVIDIA
NeMo](https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/),
are models that combine a [Fast
Conformer](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/models.html#fast-conformer)
encoder with connectionist temporal classification (CTC), recurrent
neural network transducer (RNNT) or token and duration transducer (TDT)
decoder for automatic speech recognition.
**Model Architecture**
- **Fast Conformer Encoder**: A linearly scalable Conformer architecture
that processes mel-spectrogram features and reduces sequence length
through subsampling. This is more efficient version of the Conformer
Encoder found in [FastSpeech2Conformer](./fastspeech2_conformer.md) (see
\[`ParakeetEncoder`] for the encoder implementation and details).
- [**ParakeetForCTC**](#parakeetforctc): a Fast Conformer Encoder + a
CTC decoder
- **CTC Decoder**: Simple but effective decoder consisting of:
- 1D convolution projection from encoder hidden size to vocabulary size
(for optimal NeMo compatibility).
- CTC loss computation for training.
- Greedy CTC decoding for inference.
* Add Parakeet by
[@​nithinraok](https://redirect.github.com/nithinraok) in
[#​39062](https://redirect.github.com/huggingface/transformers/issues/39062)
##### EdgeTAM
<img width="949" height="537" alt="image"
src="https://github.com/user-attachments/assets/5ca4e73d-5aa9-487d-96e1-92d4f2f4739f"
/>
The EdgeTAM model was proposed in [EdgeTAM: On-Device Track Anything
Model](https://huggingface.co/papers/2501.07256) Chong Zhou, Chenchen
Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman
Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, Bilge Soran.
EdgeTAM is an efficient adaptation of SAM 2 that introduces a 2D Spatial
Perceiver architecture to optimize memory attention mechanisms for
real-time video segmentation on mobile devices.
- Add EdgeTAM by
[@​yonigozlan](https://redirect.github.com/yonigozlan) in
[#​39800](https://redirect.github.com/huggingface/transformers/issues/39800)
##### OLMO3
More details to come soon :eyes:
- Add Olmo3 model by
[@​2015aroras](https://redirect.github.com/2015aroras) in
[#​40778](https://redirect.github.com/huggingface/transformers/issues/40778)
#### Continuous batching
We are introducing Continuous Batching (CB) in this release, we consider
it a stable feature. The main use case for CB is batched generation,
which makes it very efficient in the context of GRPO training or
evaluation. Thanks to CB, researchers or model developers are now free
to use transformers in these contexts without having to spin up an
additional inference engine.
CB currently supports both full attention and sliding window attention:
this means that the vast majority of models are supported, like llama,
gemma3, gpt-oss.
CB is also integrated with transformers serve, which means that you can
deploy transformers as an OpenAI-compatible HTTP server.
Here is a small snippet on how to use it:
```python
import datasets
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-4B-Instruct-2507", dtype=torch.bfloat16, _attn_implementation="sdpa_paged", device_map="auto"
)
model.generation_config.max_new_tokens = 32
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507", padding_side="left")
dataset = datasets.load_dataset("openai/gsm8k", "socratic", split="test")
tokenized_datasets = dataset.map(lambda x: tokenizer(x["question"]), batched=True)
simple_batch_inputs = [item["input_ids"] for item in tokenized_datasets]
batch_outputs = model.generate_batch(inputs=simple_batch_inputs)
for request in batch_outputs:
print(tokenizer.decode(batch_outputs[request].generated_tokens))
"""
Let's break down the problem step by step:
1. **Total eggs laid per day**:
Janet’s ducks lay **16 eggs per day**
Let's break down the problem step by step:
1. **Blue fiber**: The robe takes **2 bolts** of blue fiber.
2. **White fiber
To determine Josh's profit from flipping the house, let's go step by step.
---
##### Step 1: Initial cost of the house
Josh buys the
To find the total distance James runs in a week, we can break down the problem step by step:
1. **Sprints per session**: James runs
To determine how many cups of feed Wendi needs to give her chickens in the final meal of the day, let's go step by step.
"""
```
#### Breaking changes
- 🚨 Remove Group Beam Search decoding strategy by
[@​manueldeprada](https://redirect.github.com/manueldeprada) in
[#​40495](https://redirect.github.com/huggingface/transformers/issues/40495)
- 🚨 Remove Constrained Beam Search decoding strategy by
[@​manueldeprada](https://redirect.github.com/manueldeprada) in
[#​40518](https://redirect.github.com/huggingface/transformers/issues/40518)
- 🚨 Allow `check_model_inputs` in core VLMs by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40342](https://redirect.github.com/huggingface/transformers/issues/40342)
- 🔴 Update Glm4V to use config values by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40712](https://redirect.github.com/huggingface/transformers/issues/40712)
- 🚨 Fix Inconsistant `input_feature` length and `attention_mask` length
in `WhisperFeatureExtractor` by
[@​BakerBunker](https://redirect.github.com/BakerBunker) in
[#​39221](https://redirect.github.com/huggingface/transformers/issues/39221)
- ⚠️ 🔴 Add ministral model by
[@​manueldeprada](https://redirect.github.com/manueldeprada) in
[#​40247](https://redirect.github.com/huggingface/transformers/issues/40247)
- 🔴 Move variable output controls to `_prepare_generation_config ` by
[@​manueldeprada](https://redirect.github.com/manueldeprada) in
[#​40715](https://redirect.github.com/huggingface/transformers/issues/40715)
- 🔴Make `center_crop` fast equivalent to slow by
[@​yonigozlan](https://redirect.github.com/yonigozlan) in
[#​40856](https://redirect.github.com/huggingface/transformers/issues/40856)
#### Bugfixes and improvements
- Fix collated reports upload filename by
[@​ivarflakstad](https://redirect.github.com/ivarflakstad) in
[#​40556](https://redirect.github.com/huggingface/transformers/issues/40556)
- pin `pytest-rerunfailures<16.0` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40561](https://redirect.github.com/huggingface/transformers/issues/40561)
- remove the redundant non maintained jieba and use rjieba instead by
[@​divyanshsinghvi](https://redirect.github.com/divyanshsinghvi)
in
[#​40383](https://redirect.github.com/huggingface/transformers/issues/40383)
- Set `test_all_params_have_gradient=False` for `DeepseekV2ModelTest` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40566](https://redirect.github.com/huggingface/transformers/issues/40566)
- processor tests - use dummy videos by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40537](https://redirect.github.com/huggingface/transformers/issues/40537)
- \[qwen-vl] fix position ids by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40490](https://redirect.github.com/huggingface/transformers/issues/40490)
- Fix `test_eager_matches_sdpa_inference` not run for `CLIP` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40581](https://redirect.github.com/huggingface/transformers/issues/40581)
- Fix CircleCI step passes in the case of pytest worker crash at test
collection time by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40552](https://redirect.github.com/huggingface/transformers/issues/40552)
- Allow `remi-or` to `run-slow` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40590](https://redirect.github.com/huggingface/transformers/issues/40590)
- Fix llava image processor by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40588](https://redirect.github.com/huggingface/transformers/issues/40588)
- Update `get_*_features` methods + update doc snippets by
[@​qubvel](https://redirect.github.com/qubvel) in
[#​40555](https://redirect.github.com/huggingface/transformers/issues/40555)
- Fix custom generate relative imports by
[@​manueldeprada](https://redirect.github.com/manueldeprada) in
[#​40480](https://redirect.github.com/huggingface/transformers/issues/40480)
- Support batch size > 1 image-text inference by
[@​hiyouga](https://redirect.github.com/hiyouga) in
[#​36682](https://redirect.github.com/huggingface/transformers/issues/36682)
- Fix typos by [@​cyyever](https://redirect.github.com/cyyever) in
[#​40585](https://redirect.github.com/huggingface/transformers/issues/40585)
- Skip `TvpImageProcessingTest::test_slow_fast_equivalence` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40593](https://redirect.github.com/huggingface/transformers/issues/40593)
- Fix inexistent imports by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40580](https://redirect.github.com/huggingface/transformers/issues/40580)
- Add Copilot instructions by
[@​Rocketknight1](https://redirect.github.com/Rocketknight1) in
[#​40432](https://redirect.github.com/huggingface/transformers/issues/40432)
- Fix `siglip` flaky `test_eager_matches_sdpa_inference` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40584](https://redirect.github.com/huggingface/transformers/issues/40584)
- Fix for missing default values in encoder decoder by
[@​remi-or](https://redirect.github.com/remi-or) in
[#​40517](https://redirect.github.com/huggingface/transformers/issues/40517)
- Fix quite a lot of FA tests by
[@​Cyrilvallez](https://redirect.github.com/Cyrilvallez) in
[#​40548](https://redirect.github.com/huggingface/transformers/issues/40548)
- \[`Tests`] Fixup duplicated mrope logic by
[@​vasqu](https://redirect.github.com/vasqu) in
[#​40592](https://redirect.github.com/huggingface/transformers/issues/40592)
- Reduce more test data fetch by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40595](https://redirect.github.com/huggingface/transformers/issues/40595)
- Pin torchcodec to 0.5 in AMD docker by
[@​remi-or](https://redirect.github.com/remi-or) in
[#​40598](https://redirect.github.com/huggingface/transformers/issues/40598)
- Multiple fixes to FA tests in AMD by
[@​remi-or](https://redirect.github.com/remi-or) in
[#​40498](https://redirect.github.com/huggingface/transformers/issues/40498)
- Disable cache for `TokenizerTesterMixin` temporarily by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40611](https://redirect.github.com/huggingface/transformers/issues/40611)
- fix: continuous batching in `transformers serve` by
[@​McPatate](https://redirect.github.com/McPatate) in
[#​40479](https://redirect.github.com/huggingface/transformers/issues/40479)
- Fix processor chat template by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40613](https://redirect.github.com/huggingface/transformers/issues/40613)
- Avoid `too many request` caused by
`AutoModelTest::test_dynamic_saving_from_local_repo` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40614](https://redirect.github.com/huggingface/transformers/issues/40614)
- Fix flaky `JambaModelTest.test_load_balancing_loss` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40617](https://redirect.github.com/huggingface/transformers/issues/40617)
- Add collated reports job to Nvidia CI by
[@​ahadnagy](https://redirect.github.com/ahadnagy) in
[#​40470](https://redirect.github.com/huggingface/transformers/issues/40470)
- Remove unnecessary pillow version check by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40604](https://redirect.github.com/huggingface/transformers/issues/40604)
- Fix invalid typing by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40612](https://redirect.github.com/huggingface/transformers/issues/40612)
- Enable more ruff UP rules by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40579](https://redirect.github.com/huggingface/transformers/issues/40579)
- Support TF32 flag for MUSA backend by
[@​fmo-mt](https://redirect.github.com/fmo-mt) in
[#​33187](https://redirect.github.com/huggingface/transformers/issues/33187)
- Remove random flag by
[@​Cyrilvallez](https://redirect.github.com/Cyrilvallez) in
[#​40629](https://redirect.github.com/huggingface/transformers/issues/40629)
- 🌐 \[i18n-KO] Translated `deepseek_v3.md` to Korean by
[@​ssum21](https://redirect.github.com/ssum21) in
[#​39649](https://redirect.github.com/huggingface/transformers/issues/39649)
- Fix `too many requests` in `TestMistralCommonTokenizer` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40623](https://redirect.github.com/huggingface/transformers/issues/40623)
- fix: gas for gemma fixed by
[@​yevvonlim](https://redirect.github.com/yevvonlim) in
[#​40591](https://redirect.github.com/huggingface/transformers/issues/40591)
- \[auto-model] propagate kwargs by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40491](https://redirect.github.com/huggingface/transformers/issues/40491)
- \[CP] Add attention\_mask to the buffer when the mask is causal by
[@​kashif](https://redirect.github.com/kashif) in
[#​40619](https://redirect.github.com/huggingface/transformers/issues/40619)
- Fix: PIL image load in Processing utils apply\_chat\_template by
[@​abdokaseb](https://redirect.github.com/abdokaseb) in
[#​40622](https://redirect.github.com/huggingface/transformers/issues/40622)
- Skip `test_prompt_lookup_decoding_matches_greedy_search` for `voxtral`
by [@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40643](https://redirect.github.com/huggingface/transformers/issues/40643)
- add DeepseekV3ForTokenClassification by
[@​bzantium](https://redirect.github.com/bzantium) in
[#​40641](https://redirect.github.com/huggingface/transformers/issues/40641)
- fix MetaCLIP 2 wrong link & wrong model names in the docstrings by
[@​voidism](https://redirect.github.com/voidism) in
[#​40565](https://redirect.github.com/huggingface/transformers/issues/40565)
- Remove TF/Flax examples by
[@​Rocketknight1](https://redirect.github.com/Rocketknight1) in
[#​40654](https://redirect.github.com/huggingface/transformers/issues/40654)
- Mark `LongformerModelTest::test_attention_outputs` as flaky by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40655](https://redirect.github.com/huggingface/transformers/issues/40655)
- fix pipeline dtype by
[@​jiqing-feng](https://redirect.github.com/jiqing-feng) in
[#​40638](https://redirect.github.com/huggingface/transformers/issues/40638)
- feat(serving): add healthcheck by
[@​McPatate](https://redirect.github.com/McPatate) in
[#​40653](https://redirect.github.com/huggingface/transformers/issues/40653)
- Fix Metaclip modular conversion by
[@​Rocketknight1](https://redirect.github.com/Rocketknight1) in
[#​40660](https://redirect.github.com/huggingface/transformers/issues/40660)
- Avoid attention\_mask copy in qwen2.5 by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40658](https://redirect.github.com/huggingface/transformers/issues/40658)
- Allow custom args in `custom_generate` Callables and unify generation
args structure by
[@​manueldeprada](https://redirect.github.com/manueldeprada) in
[#​40586](https://redirect.github.com/huggingface/transformers/issues/40586)
- Update `check_determinism` inside `test_determinism` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40661](https://redirect.github.com/huggingface/transformers/issues/40661)
- Skip `test_fast_is_faster_than_slow` for `Owlv2ImageProcessingTest` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40663](https://redirect.github.com/huggingface/transformers/issues/40663)
- Fix warning for output\_attentions=True by
[@​qubvel](https://redirect.github.com/qubvel) in
[#​40597](https://redirect.github.com/huggingface/transformers/issues/40597)
- Skip `test_prompt_lookup_decoding_matches_greedy_search` for
`qwen2_audio` by [@​ydshieh](https://redirect.github.com/ydshieh)
in
[#​40664](https://redirect.github.com/huggingface/transformers/issues/40664)
- Remove overwritten `GitModelTest::test_beam_search_generate` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40666](https://redirect.github.com/huggingface/transformers/issues/40666)
- refactor: use `tolist` instead of list comprehension calling `.item()`
by [@​McPatate](https://redirect.github.com/McPatate) in
[#​40646](https://redirect.github.com/huggingface/transformers/issues/40646)
- Benchmarking V2: framework impl by
[@​ahadnagy](https://redirect.github.com/ahadnagy) in
[#​40486](https://redirect.github.com/huggingface/transformers/issues/40486)
- Even more test data cached by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40636](https://redirect.github.com/huggingface/transformers/issues/40636)
- Skip more fast v.s slow image processor tests by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40675](https://redirect.github.com/huggingface/transformers/issues/40675)
- Avoid night torch CI not run because of irrelevant docker image
failing to build by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40677](https://redirect.github.com/huggingface/transformers/issues/40677)
- Mark
`Aimv2ModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels`
as flaky by [@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40683](https://redirect.github.com/huggingface/transformers/issues/40683)
- CircleCI docker images cleanup / update / fix by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40681](https://redirect.github.com/huggingface/transformers/issues/40681)
- Add sequence classification support for small Gemma 3 text models by
[@​abdokaseb](https://redirect.github.com/abdokaseb) in
[#​40562](https://redirect.github.com/huggingface/transformers/issues/40562)
- Add codebook\_dim attribute to DacVectorQuantize for
DacResidualVectorQuantize.from\_latents() by
[@​flavioialongo](https://redirect.github.com/flavioialongo) in
[#​40665](https://redirect.github.com/huggingface/transformers/issues/40665)
- fix broken offline mode when loading tokenizer from hub by
[@​winglian](https://redirect.github.com/winglian) in
[#​40669](https://redirect.github.com/huggingface/transformers/issues/40669)
- Load a tiny video to make CI faster by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40684](https://redirect.github.com/huggingface/transformers/issues/40684)
- Final test data cache - inside CI docker images by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40689](https://redirect.github.com/huggingface/transformers/issues/40689)
- add: embedding model by
[@​RyanMullins](https://redirect.github.com/RyanMullins) in
[#​40694](https://redirect.github.com/huggingface/transformers/issues/40694)
- feat: support request cancellation by
[@​McPatate](https://redirect.github.com/McPatate) in
[#​40599](https://redirect.github.com/huggingface/transformers/issues/40599)
- Fixing bug in Voxtral when merging text and audio embeddings by
[@​rcogill](https://redirect.github.com/rcogill) in
[#​40671](https://redirect.github.com/huggingface/transformers/issues/40671)
- Change docker image to preview for the MI355 CI by
[@​ahadnagy](https://redirect.github.com/ahadnagy) in
[#​40693](https://redirect.github.com/huggingface/transformers/issues/40693)
- Fix backward compatibility with accelerate in Trainer by
[@​qgallouedec](https://redirect.github.com/qgallouedec) in
[#​40668](https://redirect.github.com/huggingface/transformers/issues/40668)
- Fix self.dropout\_p is not defined for SamAttention/Sam2Attention by
[@​yonigozlan](https://redirect.github.com/yonigozlan) in
[#​40667](https://redirect.github.com/huggingface/transformers/issues/40667)
- \[Glm4.5V] fix vLLM support by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40696](https://redirect.github.com/huggingface/transformers/issues/40696)
- Fix broken Llama4 accuracy in MoE part by
[@​nvpohanh](https://redirect.github.com/nvpohanh) in
[#​40609](https://redirect.github.com/huggingface/transformers/issues/40609)
- Avoid `T5GemmaModelTest::test_eager_matches_sdpa_inference` being
flaky by [@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40702](https://redirect.github.com/huggingface/transformers/issues/40702)
- Align assisted generate for unified signature in decoding methods by
[@​manueldeprada](https://redirect.github.com/manueldeprada) in
[#​40657](https://redirect.github.com/huggingface/transformers/issues/40657)
- Fetch one missing test data by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40703](https://redirect.github.com/huggingface/transformers/issues/40703)
- Add Fast Image Processor for ImageGPT by
[@​agamjots05](https://redirect.github.com/agamjots05) in
[#​39592](https://redirect.github.com/huggingface/transformers/issues/39592)
- Fetch more test data with `hf_hub_download` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40710](https://redirect.github.com/huggingface/transformers/issues/40710)
- feat(serve): add healthcheck test by
[@​McPatate](https://redirect.github.com/McPatate) in
[#​40697](https://redirect.github.com/huggingface/transformers/issues/40697)
- Fix parent classes of ProcessingKwargs by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40676](https://redirect.github.com/huggingface/transformers/issues/40676)
- \[tests] fix blip2 edge case by
[@​gante](https://redirect.github.com/gante) in
[#​40699](https://redirect.github.com/huggingface/transformers/issues/40699)
- \[moduar] Add missing `self` in post-process methods by
[@​framonmar7](https://redirect.github.com/framonmar7) in
[#​40711](https://redirect.github.com/huggingface/transformers/issues/40711)
- \[onnx] use logical `or` for grounding dino mask by
[@​lmarshall12](https://redirect.github.com/lmarshall12) in
[#​40625](https://redirect.github.com/huggingface/transformers/issues/40625)
- Fix parent classes of AllKwargsForChatTemplate by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40685](https://redirect.github.com/huggingface/transformers/issues/40685)
- Fix arguments by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40605](https://redirect.github.com/huggingface/transformers/issues/40605)
- \[serve] re-enable tests by
[@​gante](https://redirect.github.com/gante) in
[#​40717](https://redirect.github.com/huggingface/transformers/issues/40717)
- \[tests] remove overwrites of removed test by
[@​gante](https://redirect.github.com/gante) in
[#​40720](https://redirect.github.com/huggingface/transformers/issues/40720)
- Add Optional typing by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40686](https://redirect.github.com/huggingface/transformers/issues/40686)
- \[`Gemma Embedding`] Fix SWA by
[@​vasqu](https://redirect.github.com/vasqu) in
[#​40700](https://redirect.github.com/huggingface/transformers/issues/40700)
- Keypoint matching docs by
[@​merveenoyan](https://redirect.github.com/merveenoyan) in
[#​40541](https://redirect.github.com/huggingface/transformers/issues/40541)
- Skip `VitMatteImageProcessingTest::test_fast_is_faster_than_slow` by
[@​ydshieh](https://redirect.github.com/ydshieh) in
[#​40713](https://redirect.github.com/huggingface/transformers/issues/40713)
- refactor(serve): move `request_id` to headers by
[@​McPatate](https://redirect.github.com/McPatate) in
[#​40722](https://redirect.github.com/huggingface/transformers/issues/40722)
- \[Continous Batching] fix do\_Sample=True in continuous batching by
[@​kashif](https://redirect.github.com/kashif) in
[#​40692](https://redirect.github.com/huggingface/transformers/issues/40692)
- Fix order of mask functions when using `and/or_mask_function` by
[@​Cyrilvallez](https://redirect.github.com/Cyrilvallez) in
[#​40753](https://redirect.github.com/huggingface/transformers/issues/40753)
- Fix np array typing by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40741](https://redirect.github.com/huggingface/transformers/issues/40741)
- Set accepts\_loss\_kwargs to False for
ConvNext(|V2)ForImageClassification by
[@​clinty](https://redirect.github.com/clinty) in
[#​40746](https://redirect.github.com/huggingface/transformers/issues/40746)
- Add BF16 support check for MUSA backend by
[@​fmo-mt](https://redirect.github.com/fmo-mt) in
[#​40576](https://redirect.github.com/huggingface/transformers/issues/40576)
- remove gemmas eager training warning by
[@​August-murr](https://redirect.github.com/August-murr) in
[#​40744](https://redirect.github.com/huggingface/transformers/issues/40744)
- remove FSDP prefix when using save\_pretrained with FSDP2 by
[@​winglian](https://redirect.github.com/winglian) in
[#​40207](https://redirect.github.com/huggingface/transformers/issues/40207)
- feat: err when unsupported attn impl is set w/ `--continuous_batching`
by [@​McPatate](https://redirect.github.com/McPatate) in
[#​40618](https://redirect.github.com/huggingface/transformers/issues/40618)
- docs: add continuous batching to serving by
[@​McPatate](https://redirect.github.com/McPatate) in
[#​40758](https://redirect.github.com/huggingface/transformers/issues/40758)
- Remove unnecessary tildes from documentation by
[@​st81](https://redirect.github.com/st81) in
[#​40748](https://redirect.github.com/huggingface/transformers/issues/40748)
- Fix more typos by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40627](https://redirect.github.com/huggingface/transformers/issues/40627)
- Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs by
[@​clinty](https://redirect.github.com/clinty) in
[#​39364](https://redirect.github.com/huggingface/transformers/issues/39364)
- Fix `continue_final_message` in `apply_chat_template` to prevent
substring matching issues by
[@​abdokaseb](https://redirect.github.com/abdokaseb) in
[#​40732](https://redirect.github.com/huggingface/transformers/issues/40732)
- 🌐 \[i18n-KO] Translated 'xclip.md' to Korean by
[@​ssum21](https://redirect.github.com/ssum21) in
[#​39594](https://redirect.github.com/huggingface/transformers/issues/39594)
- Fix Bark failing tests by
[@​ebezzam](https://redirect.github.com/ebezzam) in
[#​39478](https://redirect.github.com/huggingface/transformers/issues/39478)
- Add EfficientLoFTRImageProcessorFast for GPU-accelerated image
processing by [@​LawJarp-A](https://redirect.github.com/LawJarp-A)
in
[#​40215](https://redirect.github.com/huggingface/transformers/issues/40215)
- Fix: swanlab `public.cloud.experiment_url` api error by
[@​Zeyi-Lin](https://redirect.github.com/Zeyi-Lin) in
[#​40763](https://redirect.github.com/huggingface/transformers/issues/40763)
- \[generate] `PromptLookupCandidateGenerator` won't generate forbidden
tokens by [@​gante](https://redirect.github.com/gante) in
[#​40726](https://redirect.github.com/huggingface/transformers/issues/40726)
- Support sliding window in CB by
[@​remi-or](https://redirect.github.com/remi-or) in
[#​40688](https://redirect.github.com/huggingface/transformers/issues/40688)
- \[deprecations] Remove generate-related deprecations up to v4.56 by
[@​gante](https://redirect.github.com/gante) in
[#​40729](https://redirect.github.com/huggingface/transformers/issues/40729)
- rm src/transformers/convert\_pytorch\_checkpoint\_to\_tf2.py by
[@​gante](https://redirect.github.com/gante) in
[#​40718](https://redirect.github.com/huggingface/transformers/issues/40718)
- \[tests] update `test_past_key_values_format` and delete overwrites by
[@​gante](https://redirect.github.com/gante) in
[#​40701](https://redirect.github.com/huggingface/transformers/issues/40701)
- \[RoPE] run RoPE tests when the model uses RoPE by
[@​gante](https://redirect.github.com/gante) in
[#​40630](https://redirect.github.com/huggingface/transformers/issues/40630)
- Fix crash when executing MambaCache sample code by
[@​torotoki](https://redirect.github.com/torotoki) in
[#​40557](https://redirect.github.com/huggingface/transformers/issues/40557)
- \[pipeline] ASR pipeline kwargs are forwared to `generate` by
[@​gante](https://redirect.github.com/gante) in
[#​40375](https://redirect.github.com/huggingface/transformers/issues/40375)
- \[docs] CPU install by
[@​stevhliu](https://redirect.github.com/stevhliu) in
[#​40631](https://redirect.github.com/huggingface/transformers/issues/40631)
- Adding Support for Qwen3-Next by
[@​bozheng-hit](https://redirect.github.com/bozheng-hit) in
[#​40771](https://redirect.github.com/huggingface/transformers/issues/40771)
- Fix gpt-oss router\_indices in EP by
[@​jiqing-feng](https://redirect.github.com/jiqing-feng) in
[#​40545](https://redirect.github.com/huggingface/transformers/issues/40545)
- Remove reference of video\_load\_backend and video\_fps for processor
by [@​cyyever](https://redirect.github.com/cyyever) in
[#​40719](https://redirect.github.com/huggingface/transformers/issues/40719)
- \[processors] Unbloating simple processors by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp) in
[#​40377](https://redirect.github.com/huggingface/transformers/issues/40377)
- Enable ruff on benchmark and scripts by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40634](https://redirect.github.com/huggingface/transformers/issues/40634)
- Fix doc for PerceptionLMForConditionalGeneration forward. by
[@​shuminghu](https://redirect.github.com/shuminghu) in
[#​40733](https://redirect.github.com/huggingface/transformers/issues/40733)
- Fix typos in tests and util by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40780](https://redirect.github.com/huggingface/transformers/issues/40780)
- Fix invalid PipelineParallel member by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40789](https://redirect.github.com/huggingface/transformers/issues/40789)
- Use functools.cached\_property by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40607](https://redirect.github.com/huggingface/transformers/issues/40607)
- Read config pattern for Qwen3Next by
[@​Cyrilvallez](https://redirect.github.com/Cyrilvallez) in
[#​40792](https://redirect.github.com/huggingface/transformers/issues/40792)
- Fix dotted model names by
[@​August-murr](https://redirect.github.com/August-murr) in
[#​40745](https://redirect.github.com/huggingface/transformers/issues/40745)
- Fix the issue that csm model cannot work with pipeline mode. by
[@​yuanwu2017](https://redirect.github.com/yuanwu2017) in
[#​39349](https://redirect.github.com/huggingface/transformers/issues/39349)
- Move num\_items\_in\_batch to correct device before accelerator.gather
by [@​ssharpe42](https://redirect.github.com/ssharpe42) in
[#​40773](https://redirect.github.com/huggingface/transformers/issues/40773)
- Remove use\_ipex option from Trainer by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40784](https://redirect.github.com/huggingface/transformers/issues/40784)
- fix\_image\_processing\_fast\_for\_glm4v by
[@​lambertwjh](https://redirect.github.com/lambertwjh) in
[#​40483](https://redirect.github.com/huggingface/transformers/issues/40483)
- \[Docs] Add missing class documentation for optimizer\_schedules by
[@​jijihuny](https://redirect.github.com/jijihuny) in
[#​31870](https://redirect.github.com/huggingface/transformers/issues/31870),
[#​23010](https://redirect.github.com/huggingface/transformers/issues/23010))
- Fix DeepSpeed mixed precision precedence over Accelerate defaults by
[@​notkisk](https://redirect.github.com/notkisk) in
[#​39856](https://redirect.github.com/huggingface/transformers/issues/39856)
- feature: Add robust token counting with padding exclusion by
[@​PrathmeshAdsod](https://redirect.github.com/PrathmeshAdsod) in
[#​40416](https://redirect.github.com/huggingface/transformers/issues/40416)
- Fix edge case for tokenize by
[@​wangzhen0518](https://redirect.github.com/wangzhen0518) in
[#​36277](https://redirect.github.com/huggingface/transformers/issues/36277))
- Fix config dtype parsing for Emu3 edge case by
[@​Isotr0py](https://redirect.github.com/Isotr0py) in
[#​40766](https://redirect.github.com/huggingface/transformers/issues/40766)
- Align torch implementation of Gated DeltaNet in Qwen3-Next with fla
library. by
[@​bozheng-hit](https://redirect.github.com/bozheng-hit) in
[#​40807](https://redirect.github.com/huggingface/transformers/issues/40807)
- Fix typos in src by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40782](https://redirect.github.com/huggingface/transformers/issues/40782)
- add general hub test for Fast Image Processors in
test\_image\_processing\_utils by
[@​namgyu-youn](https://redirect.github.com/namgyu-youn) in
[#​40086](https://redirect.github.com/huggingface/transformers/issues/40086)
- Push generation config along with checkpoints by
[@​qgallouedec](https://redirect.github.com/qgallouedec) in
[#​40804](https://redirect.github.com/huggingface/transformers/issues/40804)
- \[`Jetmoe`] Fix RoPE by
[@​vasqu](https://redirect.github.com/vasqu) in
[#​40819](https://redirect.github.com/huggingface/transformers/issues/40819)
- 🌐 \[i18n-KO] Translated clipseg.md to Korean by
[@​HyunZ118](https://redirect.github.com/HyunZ118) in
[#​39903](https://redirect.github.com/huggingface/transformers/issues/39903)
- Improve torch\_dtype checks by
[@​cyyever](https://redirect.github.com/cyyever) in
[#​40808](https://redirect.github.com/huggingface/transformers/issues/40808)
- Add VideoProcessors to auto-backend requirements by
[@​Cyrilvallez](https://redirect.github.com/Cyrilvallez) in
[#​40843](https://redirect.github.com/huggingface/transformers/issues/40843)
- Adds Causal Conv 1D kernel for mamba models by
[@​MekkCyber](https://redirect.github.com/MekkCyber) in
[#​40765](https://redirect.github.com/huggingface/transformers/issues/40765)
- Update no split modules in T5Gemma model by
[@​npuichigo](https://redirect.github.com/npuichigo) in
[#​40810](https://redirect.github.com/huggingface/transformers/issues/40810)
- Replace image classification loss functions to `self.loss_function` by
[@​qubvel](https://redirect.github.com/qubvel) in
[#​40764](https://redirect.github.com/huggingface/transformers/issues/40764)
- Fix the misalignment between the l2norm in GDN of Qwen3-Next and the
implementation in the FLA library. by
[@​bozheng-hit](https://redirect.github.com/bozheng-hit) in
[#​40842](https://redirect.github.com/huggingface/transformers/issues/40842)
- Fixes for continuous batching by
[@​remi-or](https://redirect.github.com/remi-or) in
[#​40828](https://redirect.github.com/huggingface/transformers/issues/40828)
- \[tests] re-enable aria fast tests by
[@​gante](https://redirect.g
</details>
---
### Configuration
📅 **Schedule**: Branch creation - "after 9am and before 5pm every
weekday,every weekend" in timezone UTC, Automerge - At any time (no
schedule defined).
🚦 **Automerge**: Enabled.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/wharflab/tally).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My45NC4xIiwidXBkYXRlZEluVmVyIjoiNDMuOTQuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIiwicmVub3ZhdGUiXX0=-->
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Latest Branches
+2%
renovate/lock-file-maintenance -1%
feat/windows-no-chown-flag +3%
renovate/transformers-4.x © 2026 CodSpeed Technology