Update dependency transformers to v5 by dev-mend-for-github-com[bot] · Pull Request #5 · MuhammadAEws/amazon-bedrock-workshop

dev-mend-for-github-com · 2026-05-02T01:59:12Z

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package	Update	Change
transformers	major	`==4.49.0` → `==5.0.0rc3`

By merging this PR, the below vulnerabilities will be automatically resolved:

Severity	CVSS Score	Vulnerability
Medium	6.5	CVE-2026-1839
Medium	5.3	CVE-2025-6921
Medium	4.3	CVE-2025-1194
Low	3.5	CVE-2025-3777

Release Notes

huggingface/transformers (transformers)

`v5.0.0rc3`: Release candidate v5.0.0rc3

Compare Source

Release candidate v5.0.0rc3

New models:

[GLM-4.7] GLM-Lite Supoort by @zRzRzRzRzRzRzR in #43031
[GLM-Image] AR Model Support for GLM-Image by @zRzRzRzRzRzRzR in #43100
Add LWDetr model by @sbucaille in #40991
Add LightOnOCR model implementation by @baptiste-aubertin in #41621

What's Changed

We are getting closer and closer to the official release!
This RC is focused on removing more of the deprecated stuff, fixing some minors issues, doc updates.

Update Japanese README to match English version by @lilin-1 in #43069
[docs] Deploying by @stevhliu in #42263
[docs] inference engines by @stevhliu in #42932
Fix typos: Remove duplicate duplicate words words by @efeecllk in #43040
[style] Rework ruff rules and update all files by @Cyrilvallez in #43144
[CB] Minor fix in kwargs by @remi-or in #43147
[Bug] qwen2_5_omni: cap generation length to be less than the max_position_embedding in DiT by @sniper35 in #43068
Fix some deprecated practices in torch 2.9 by @Cyrilvallez in #43167
Fix Fuyu processor width dimension bug in _get_num_multimodal_tokens by @Abhinavexists in #43137
Inherit from PreTrainedTokenizerBase by @juliendenize in #43143
Generation config boolean defaults by @zucchini-nlp in #43000
Fix failing BartModelIntegrationTest by @Sai-Suraj-27 in #43160
fix failure of llava/pixtral by @sywangyi in #42985
GemmaTokenizer: remove redundant whitespace pre-tokenizer by @vaibhav-research in #43106
Support auto_doctring in Processors by @yonigozlan in #42101
Fix failing BitModelIntegrationTest by @Sai-Suraj-27 in #43164
[Fp8] Fix experts by @vasqu in #43154
Docs: improve wording for documentation build instructions by @Sailnagale in #43007
[makefile] Cleanup and improve the rules by @Cyrilvallez in #43171
Some new models added stuff that was already removed by @Cyrilvallez in #43179
Fixes and compilation warning in torchao docs by @merveenoyan in #42909
[cache] Remove all deprecated classes by @Cyrilvallez in #43168
Bump huggingface_hub minimal version by @Wauplin in #43188
Rework check_config_attributes.py by @Cyrilvallez in #43191
Fix generation config validation by @zucchini-nlp in #43175
[style] Use 'x | y' syntax for processors as well by @Wauplin in #43189
Remove deprecated objects by @Cyrilvallez in #43170
fix chunked prefill implementation issue-43082 by @marcndo in #43132
Reduce add_dates verbosity by @yonigozlan in #43184
Add support for MiniMax-M2 by @rogeryoungh in #42028
Fix failing salesforce-ctrl, xlm & gpt-neo model generation tests by @Sai-Suraj-27 in #43180
Less verbose library helpers by @Cyrilvallez in #43197
run all test files on CircleCI by @ydshieh in #43146
Clamp temperature to >=1.0 for Dia generation by @Haseebasif7 in #43029
Fix spelling typos in comments and code by @raimbekovm in #43046
[docs] llama.cpp by @stevhliu in #43185
[docs] gptq formatting fix by @victorywwong in #43216
Grouped beam search from config params by @zucchini-nlp in #42472
[Generate] Allow custom config values in generate config by @vasqu in #43181
Fix failing Pix2StructIntegrationTest by @Sai-Suraj-27 in #43229
Fix missing UTF-8 encoding in check_repo.py for Windows compatibility by @aarushisingh04 in #43123
[Tokenizer] Change default value of return_dict to True in doc string for apply_chat_template by @kashif in #43223
Fix failing PhiIntegrationTests by @Sai-Suraj-27 in #43214
Use HF_TOKEN directly and remove require_read_token by @ydshieh in #43233
Fix failing Owlv2ModelIntegrationTest & OwlViTModelIntegrationTest by @Sai-Suraj-27 in #43182
Fix flashattn wrt quantized models by @SunMarc in #43145
Remove unused imports by @cyyever in #43078
Fix unsafe torch.load() in _load_rng_state allowing arbitrary code execution by @ColeMurray in #43140
Reapply modular to examples by @Cyrilvallez in #43234
More robust diff checks in add_dates by @yonigozlan in #43199
docs: fix grammatical error in README.md by @davidfertube in #43236
Fix typo: seperately → separately in lw_detr converter by @skyvanguard in #43235
Qwen-VL video processor accepts min/max pixels by @zucchini-nlp in #43228
Deprecate dtype per sub config by @zucchini-nlp in #42990
Remove more deprecated objects/args by @Cyrilvallez in #43195
[CB] Soft-reset offloading by @remi-or in #43150
Make benchmark-v2 to be device agnostic, to support more torch built-in devices like xpu by @yao-matrix in #43153
Fix benchmark script by @Cyrilvallez in #43253
Adding to run slow by @IlyasMoutawwakil in #43250
Fix failing Vip-llava model integration test by @Sai-Suraj-27 in #43252
Remove deprecated and unused position_ids in all apply_rotary_pos_emb by @Cyrilvallez in #43255
fix _get_test_info in testing_utils.py by @ydshieh in #43259
Fix failing Hiera, SwiftFormer & LED Model integration tests by @Sai-Suraj-27 in #43225
[style] Fix init isort and align makefile and CI by @Cyrilvallez in #43260
[docs] tensorrt-llm by @stevhliu in #43176
[consistency] Ensure models are added to the _toctree.yml by @Cyrilvallez in #43264
Fix failing PegasusX, Mvp & LED model integration tests by @Sai-Suraj-27 in #43245
[CB] Ensure parallel decoding test passes using FA by @remi-or in #43277
fix crash in when running FSDP2+TP by @sywangyi in #43226
[ci] Fixing some failing tests for important models by @Abdennacer-Badaoui in #43231

New Contributors

@efeecllk made their first contribution in #43040
@sniper35 made their first contribution in #43068
@Abhinavexists made their first contribution in #43137
@vaibhav-research made their first contribution in #43106
@Sailnagale made their first contribution in #43007
@rogeryoungh made their first contribution in #42028
@Haseebasif7 made their first contribution in #43029
@victorywwong made their first contribution in #43216
@aarushisingh04 made their first contribution in #43123
@ColeMurray made their first contribution in #43140
@davidfertube made their first contribution in #43236
@skyvanguard made their first contribution in #43235
@baptiste-aubertin made their first contribution in #41621

Full Changelog: huggingface/transformers@v5.0.0rc2...v5.0.0rc3

`v5.0.0rc2`: Release candidate 5.0.0rc2

Compare Source

What's Changed

This release candidate is focused on fixing AutoTokenizer, expanding the dynamic weight loading support, and improving performances with MoEs!

MoEs and performances:

batched and grouped experts implementations by @IlyasMoutawwakil in #42697
Optimize MoEs for decoding using batched_mm by @IlyasMoutawwakil in #43126

Tokenization:

The main issue with the tokenization refactor is that tokenizer_class are now "enforced" when in most cases they are wrong. This took a while to properly isolate and now we try to use TokenizersBackend whenever we can. #42894 has a much more detailed description of the big changes!

use TokenizersBackend by @ArthurZucker in #42894
Fix convert_tekken_tokenizer by @juliendenize in #42592
refactor more tokenizers - v5 guide update by @itazap in #42768
[Tokenizers] Change treatment of special tokens by @vasqu in #42903

Core

Here we focused on boosting the performances of loading weights on device!

[saving] Simplify general logic by @Cyrilvallez in #42766
Do not rely on config for inferring model dtype by @Cyrilvallez in #42838
Improve BatchFeature: stack list and lists of torch tensors by @yonigozlan in #42750
Remove tied weights from internal attribute if they are not tied by @Cyrilvallez in #42871
Enforce call to post_init and fix all of them by @Cyrilvallez in #42873
Simplify tie weights logic by @Cyrilvallez in #42895
Add buffers to _init_weights for ALL models by @Cyrilvallez in #42309
[loading] Really initialize on meta device for huge perf gains by @Cyrilvallez in #42941
Do not use accelerate hooks if the device_map has only 1 device by @Cyrilvallez in #43019
Move missing weights and non-persistent buffers to correct device earlier by @Cyrilvallez in #43021

New models

Sam: Perception Encoder Audiovisual by @eustlb in #42905
adds jais2 model support by @sarathc-cerebras in #42684
Add Pixio pre-trained models by @LiheYoung in #42795
[Ernie 4.5] Ernie VL models by @vasqu in #39585
[loading][TP] Fix device placement at loading-time, and simplify sharding primitives by @Cyrilvallez in #43003
GLM-ASR Support by @zRzRzRzRzRzRzR in #42875

Quantization

[Devstral] Make sure FP8 conversion works correctly by @patrickvonplaten in #42715
Fp8 dq by @SunMarc in #42926
[Quantization] Removing misleading int8 quantization in Finegrained FP8 by @MekkCyber in #42945
Fix deepspeed + quantization by @SunMarc in #43006

Breaking changes

Mostly around processors!

🚨 Fix ConvNeXt image processor default interpolation to BICUBIC by @lukepayyapilli in #42934
🚨 Fix EfficientNet image processor default interpolation to BICUBIC by @lukepayyapilli in #42956
Add fast version of convert_segmentation_map_to_binary_masks to EoMT by @simonreise in #43073
🚨Fix MobileViT image processor default interpolation to BICUBIC by @lukepayyapilli in #43024

Thanks again to everyone !

New Contributors

@ZX-ModelCloud made their first contribution in #42833
@AYou0207 made their first contribution in #42863
@wasertech made their first contribution in #42864
@preetam1407 made their first contribution in #42685
@Taise228 made their first contribution in #41416
@CandiedCode made their first contribution in #42885
@sarathc-cerebras made their first contribution in #42684
@nandan2003 made their first contribution in #42318
@LiheYoung made their first contribution in #42795
@majiayu000 made their first contribution in #42928
@lukepayyapilli made their first contribution in #42934
@leaderofARS made their first contribution in #42966
@qianyue76 made their first contribution in #43095
@stefgina made their first contribution in #43033
@HuiyingLi made their first contribution in #43084
@raimbekovm made their first contribution in #43038
@PredictiveManish made their first contribution in #43053
@pushkar-hue made their first contribution in #42736
@vykhovanets made their first contribution in #43042
@tanmay2004 made their first contribution in #42737
@atultw made their first contribution in #43061

Full Changelog: huggingface/transformers@v5.0.0rc1...v5.0.0rc2

`v5.0.0rc1`: Release candidate 5.0.0rc1

Compare Source

What's Changed

This release candidate was focused mostly on quantization support with the new dynamic weight loader, and a few notable 🚨 breaking changes🚨:

Default dtype for any model when using from_pretrained is now auto!

Default auto 🚨 🚨 by @ArthurZucker in #42805

Default shard size when saving a model is now 50GB:

🚨🚨 [saving] Default to 50GB shards, and remove non-safe serialization by @Cyrilvallez in #42734
This is now as fast as before thanks to xet, and is just more convenient on the hub.

Kwargs. They are fundamental to enable integration with vllm and other toosl:

Every model forward() should have **kwargs by @Rocketknight1 in #42603

Dynamic weight loader updates:

Mostly QOL and fixed + support back CPU offloading.

mark params as _is_hf_initialized with DS Zero3 from weight conversion by @winglian in #42626
[loading] Allow loading to happen without threading by @Cyrilvallez in #42619
[loading] Correctly load params during offloading & careful memory considerations by @Cyrilvallez in #42632
allow registration of custom checkpoint conversion mappings by @winglian in #42634

New models:

Add FastVLM by @camilla-deckard in #41112
Lasr model by @eustlb in #42648
[Model] Add PaddleOCR-VL Model Support by @zhang-prog in #42178

Some notable quantization fixes:

Mostly added support for fbgemme , quanto,

Fix fp8 + some enhancement by @SunMarc in #42455
Fix eetq quanto quant methods by @SunMarc in #42557
[Quantization] per tensor quantization kernel by @MekkCyber in #42560
[Quantization] fix fbgemm by @MekkCyber in #42561
[Quantization] Fix FP8 experts replacing by @MekkCyber in #42654
[Quantization] Fix Static FP8 Quantization by @MekkCyber in #42775
[core] fix fp-quant by @MekkCyber in #42613

Peft:

The dynamic weight loader broke small things, this adds glue for all models but MoEs.

FIX Error when trying to load non-LoRA PEFT by @BenjaminBossan in #42663
Fix PEFT integration with new weight loader by @Cyrilvallez in #42701

Misc

Tokenization needed more refactoring, this time its a lot cleaner!

Refactor-tokenization-more by @ArthurZucker in #42563
Only default rope_parameters to empty dict if there is something to put in it by @hmellor in #42651

We omitted a lot of other commits for clarity, but thanks to everyone and the new contributors!

New Contributors

@camilla-deckard made their first contribution in #41112
@Aaraviitkgp made their first contribution in #42466
@ngazagna-qc made their first contribution in #40691
@arrdel made their first contribution in #42577
@marconaguib made their first contribution in #42587
@Xiao-Chenguang made their first contribution in #42436
@Furkan-rgb made their first contribution in #42465
@mertunsall made their first contribution in #42615
@anranlee99 made their first contribution in #42438
@UserChen666 made their first contribution in #42335
@efazal made their first contribution in #41723
@Harrisonyong made their first contribution in #36416
@hawon223 made their first contribution in #42384
@Bissmella made their first contribution in #42647
@AgainstEntropy made their first contribution in #42689
@dongluw made their first contribution in #42642
@hqkqn32 made their first contribution in #42620
@zhang-prog made their first contribution in #42178

Full Changelog: huggingface/transformers@v5.0.0rc0...v5.0.0rc1

`v5.0.0rc0`: Transformers v5.0.0rc0

Compare Source

Transformers v5 release notes

Highlights
Significant API changes: dynamic weight loading, tokenization
Backwards Incompatible Changes
Bugfixes and improvements

Highlights

We are excited to announce the initial release of Transformers v5. This is the first major release in five years, and the release is significant: 800 commits have been pushed to main since the latest minor release. This release removes a lot of long-due deprecations, introduces several refactors that significantly simplify our APIs and internals, and comes with a large number of bug fixes.

We give an overview of our focus for this release in the following blogpost. In these release notes, we'll focus directly on the refactors and new APIs coming with v5.

This release is a release candidate (RC). It is not the final v5 release, and we will push on pypi as a pre-release. This means that the current release is purely opt-in, as installing transformers without specifying this exact release will install the latest version instead (v4.57.3 as of writing).

In order to install this release, please do so with the following:

pip install transformers --pre

For us to deliver the best package possible, it is imperative that we have feedback on how the toolkit is currently working for you. Please try it out, and open an issue in case you're facing something inconsistent/a bug.

Transformers version 5 is a community endeavor, and this is the last mile. Let's ship this together!

Significant API changes

[!NOTE]
👀 Nothing is final and things are still actively in movement. We have a section dedicated to what is planned for future release candidates, yet is known not to work in the RC0. Look for "Disclaimers for the RC0".

We'll be eagerly awaiting your feedback in our GitHub issues!

Dynamic weight loading

We introduce a new weight loading API in transformers, which significantly improves on the previous API. This
weight loading API is designed to apply operations to the checkpoints loaded by transformers.

Instead of loading the checkpoint exactly as it is serialized within the model, these operations can reshape, merge,
and split the layers according to how they're defined in this new API. These operations are often a necessity when
working with quantization or parallelism algorithms.

This new API is centered around the new WeightConverter class:

class WeightConverter(WeightTransform):
    operations: list[ConversionOps]
    source_keys: Union[str, list[str]]
    target_keys: Union[str, list[str]]

The weight converter is designed to apply a list of operations on the source keys, resulting in target keys. A common
operation done on the attention layers is to fuse the query, key, values layers. Doing so with this API would amount
to defining the following conversion:

conversion = WeightConverter(
    ["self_attn.q_proj", "self_attn.k_proj", "self_attn.v_proj"],  # The input layers
    "self_attn.qkv_proj",  # The single layer as output
    operations=[Concatenate(dim=0)],
)

In this situation, we apply the Concatenate operation, which accepts a list of layers as input and returns a single
layer.

This allows us to define a mapping from architecture to a list of weight conversions. Applying those weight conversions
can apply arbitrary transformations to the layers themselves. This significantly simplified the from_pretrained method
and helped us remove a lot of technical debt that we accumulated over the past few years.

This results in several improvements:

Much cleaner definition of transformations applied to the checkpoint
Reversible transformations, so loading and saving a checkpoint should result in the same checkpoint
Faster model loading thanks to scheduling of tensor materialization
Enables complex mix of transformations that wouldn't otherwise be possible (such as quantization + MoEs, or TP + MoEs)

While this is being implemented, expect varying levels of support across different release candidates.

Linked PR: #41580

Tokenization

Just as we moved towards a single backend library for model definition, we want our tokenizers, and the Tokenizer object to be a lot more intuitive. With v5, tokenizer definition is much simpler; one can now initialize an empty LlamaTokenizer and train it directly on your corpus.

Defining a new tokenizer object should be as simple as this:

from transformers import TokenizersBackend, generate_merges
from tokenizers import pre_tokenizers, Tokenizer
from tokenizers.model import BPE

class Llama5Tokenizer(TokenizersBackend):
    def __init__(self, unk_token="<unk>",bos_token="<s>", eos_token="</s>", vocab=None, merges=None ):
        if vocab is None:
            self._vocab = {
                str(unk_token): 0,
                str(bos_token): 1,
                str(eos_token): 2,
            }

        else:
            self._vocab = vocab

        if merges is not None:
            self._merges = merges
        else:
            self._merges = generate_merges(filtered_vocab)

        self._tokenizer = Tokenizer(
            BPE(vocab=self._vocab, merges=self._merges, fuse_unk=True)
        )
        self._tokenizer.pre_tokenizer = pre_tokenizers.Metaspace(
            replacement="▁", prepend_scheme=_get_prepend_scheme(self.add_prefix_space, self), split=False
        )
        super().__init__(
            tokenizer_object=self._tokenizer,
            unk_token=unk_token,
            bos_token=bos_token,
            eos_token=eos_token,
        )

Once the tokenizer is defined as above, you can load it with the following: Llama5Tokenizer(). Doing this returns you an empty, trainable tokenizer that follows the definition of the authors of Llama5 (it does not exist yet 😉).

The above is the main motivation towards refactoring tokenization: we want tokenizers to behave similarly to models: trained or empty, and with exactly what is defined in their class definition.

Backend Architecture Changes: moving away from the slow/fast tokenizer separation

Up to now, transformers maintained two parallel implementations for many tokenizers:

"Slow" tokenizers (tokenization_<model>.py) - Python-based implementations, often using SentencePiece as the backend.
"Fast" tokenizers (tokenization_<model>_fast.py) - Rust-based implementations using the 🤗 tokenizers library.

In v5, we consolidate to a single tokenizer file per model: tokenization_<model>.py. This file will use the most appropriate backend available:

TokenizersBackend (preferred): Rust-based tokenizers from the 🤗 tokenizers library. In general it provides optimal performance, but it also offers a lot more features that are commonly adopted across the ecosystem:

handling additional tokens
a full python API for setting and updating
automatic parallelization,
automatic offsets
customization
training

SentencePieceBackend: for tokenizers requiring the sentencepiece library. It inherits from PythonBackend.
PythonBackend: a Python implementations of the features provided by tokenizers. Basically allows adding tokens.
MistralCommonBackend: relies on MistralCommon's tokenization library. (Previously known as the MistralCommonTokenizer)

The AutoTokenizer automatically selects the appropriate backend based on available files and dependencies. This is transparent, you continue to use AutoTokenizer.from_pretrained() as before. This allows transformers to be future-proof and modular to easily support future backends.

Defining a tokenizers outside of the existing backends

We enable users and tokenizer builders to define their own tokenizers from top to bottom. Tokenizers are usually defined using a backend such as tokenizers, sentencepiece or mistral-common, but we offer the possibility to design the tokenizer at a higher-level, without relying on those backends.

To do so, you can import the PythonBackend (which was previously known as PreTrainedTokenizer). This class encapsulates all the logic related to added tokens, encoding, and decoding.

If you want something even higher up the stack, then PreTrainedTokenizerBase is what PythonBackend inherits from. It contains the very basic tokenizer API features:

encode
decode
vocab_size
get_vocab
convert_tokens_to_ids
convert_ids_to_tokens
from_pretrained
save_pretrained
among a few others

API Changes

1. Direct tokenizer initialization with vocab and merges

Starting with v5, we now enable initializing blank, untrained tokenizers-backed tokenizers:

from transformers import LlamaTokenizer

tokenizer = LlamaTokenizer()

This tokenizer will therefore follow the definition of the LlamaTokenizer as defined in its class definition. It can then be trained on a corpus as can be seen in the tokenizers documentation.

These tokenizers can also be initialized from vocab and merges (if necessary), like the previous "slow" tokenizers:

from transformers import LlamaTokenizer

vocab = {"<unk>": 0, "<s>": 1, "</s>": 2, "hello": 3, "world": 4}
merges = [("h", "e"), ("l", "l"), ("o", " ")]

tokenizer = LlamaTokenizer(vocab=vocab, merges=merges)

This tokenizer will behave as a Llama-like tokenizer, with an updated vocabulary. This allows comparing different tokenizer classes with the same vocab; therefore enabling the comparison of different pre-tokenizers, normalizers, etc.

⚠️ The vocab_file (as in, a path towards a file containing the vocabulary) cannot be used to initialize the LlamaTokenizer as loading from files is reserved to the from_pretrained method.

2. Simplified decoding API

The batch_decode and decode methods have been unified to reflect behavior of the encode method. Both single and batch decoding now use the same decode method. See an example of the new behavior below:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("t5-small") 
inputs = ["hey how are you?", "fine"]
tokenizer.decode(tokenizer.encode(inputs))

Gives:

- 'hey how are you?</s> fine</s>'
+ ['hey how are you?</s>', 'fine</s>']

We expect encode and decode to behave, as two sides of the same coin: encode, process, decode, should work.

[!NOTE]
A common use-case would be: encode, model.generate, decode. However, using generate would return list[list[int]], which would then be incompatible with decode.

3. Unified encoding API

The encode_plus method is deprecated in favor of the single __call__ method.

4. `apply_chat_template` returns `BatchEncoding`

Previously, apply_chat_template returned input_ids for backward compatibility. Starting with v5, it now consistently returns a BatchEncoding dict like other tokenizer methods.

# v5
messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"}
]

# Now returns BatchEncoding with input_ids, attention_mask, etc.
outputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
print(outputs.keys())  # dict_keys(['input_ids', 'attention_mask'])

5. Removed legacy configuration file saving:

We simplify the serialization of tokenization attributes:

special_tokens_map.json - special tokens are now stored in tokenizer_config.json.
added_tokens.json - added tokens are now stored in tokenizer.json.
added_tokens_decoder is only stored when there is no tokenizer.json.

When loading older tokenizers, these files are still read for backward compatibility, but new saves use the consolidated format. We're gradually moving towards consolidating attributes to fewer files so that other libraries and implementations may depend on them more reliably.

6. Model-Specific Changes

Several models that had identical tokenizers now import from their base implementation:

LayoutLM → uses BertTokenizer
LED → uses BartTokenizer
Longformer → uses RobertaTokenizer
LXMert → uses BertTokenizer
MT5 → uses T5Tokenizer
MVP → uses BartTokenizer

These modules will eventually be removed altogether.

Removed T5-specific workarounds

The internal _eventually_correct_t5_max_length method has been removed. T5 tokenizers now handle max length consistently with other models.

Testing Changes

A few testing changes specific to tokenizers have been applied:

Model-specific tokenization test files now focus on integration tests.
Common tokenization API tests (e.g., add_tokens, encode, decode) are now centralized and automatically applied across all tokenizers. This reduces test duplication and ensures consistent behavior

For legacy implementations, the original BERT Python tokenizer code (including WhitespaceTokenizer, BasicTokenizer, etc.) is preserved in bert_legacy.py for reference purposes.

7. Deprecated / Modified Features

Special Tokens Structure:

SpecialTokensMixin: Merged into PreTrainedTokenizerBase to simplify the tokenizer architecture.
special_tokens_map: Now only stores named special token attributes (e.g., bos_token, eos_token). Use extra_special_tokens for additional special tokens (formerly additional_special_tokens). all_special_tokens includes both named and extra tokens.

# v4
tokenizer.special_tokens_map  # Included 'additional_special_tokens'

# v5
tokenizer.special_tokens_map  # Only named tokens
tokenizer.extra_special_tokens  # Additional tokens

special_tokens_map_extended and all_special_tokens_extended: Removed. Access AddedToken objects directly from _special_tokens_map or _extra_special_tokens if needed.
additional_special_tokens: Still accepted for backward compatibility but is automatically converted to extra_special_tokens.

Deprecated Methods:

sanitize_special_tokens(): Already deprecated in v4, removed in v5.
prepare_seq2seq_batch(): Deprecated; use __call__() with text_target parameter instead.

# v4
model_inputs = tokenizer.prepare_seq2seq_batch(src_texts, tgt_texts, max_length=128)

# v5
model_inputs = tokenizer(src_texts, text_target=tgt_texts, max_length=128, return_tensors="pt")
model_inputs["labels"] = model_inputs.pop("input_ids_target")

BatchEncoding.words(): Deprecated; use word_ids() instead.

Removed Methods:

create_token_type_ids_from_sequences(): Removed from base class. Subclasses that need custom token type ID creation should implement this method directly.
clean_up_tokenization(): Removed from base class. Now defined at model class level for models that need it (e.g., PLBart, CLVP, Wav2Vec2).
prepare_for_model(), build_inputs_with_special_tokens(), truncate_sequences(): Moved from tokenization_utils_base.py to tokenization_python.py for PythonBackend tokenizers. TokenizersBackend provides model-ready input via tokenize() and encode(), so these methods are no longer needed in the base class.
_switch_to_input_mode(), _switch_to_target_mode(), as_target_tokenizer(): Removed from base class. Use __call__() with text_target parameter instead.

# v4
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)

# v5
labels = tokenizer(text_target=tgt_texts, ...)

parse_response(): Removed from base class.

Disclaimers for the RC0

PEFT + MoE:

Because we are switching from the naive MOE (nn.ModuleList for experts) we currently have an issue with MoEs that have adapters. For more details see #42491 (comment).

We aim for this to be fixed and released in a following release candidate in the week that follows RC0.

Tensor parallel and Expert parallel + MoE

We are streamlining the MoE support with vLLM; while this is being implemented, tensor parallelism and expert parallelism aren't working as expected.
This is known and actively being worked on.

We aim for this to be fixed and released in a following release candidate in the week that follows RC0.

Custom pretrained models:

For anyone inheriting from a transformers PreTrainedModel, the weights are automatically initialized with the common scheme:

    @&#8203;torch.no_grad()
    def _init_weights(self, module):
        """
        Initialize the weights. This is quite general on purpose, in the spirit of what we usually do. For more complex
        initialization scheme, it should be overridden by the derived `PreTrainedModel` class. In case a model adds an explicit
        `nn.Parameter`, this method should also be overridden in order to initialize it correctly.
        """
        if hasattr(self.config, "initializer_range"):
            std = self.config.initializer_range or 0.02
        elif hasattr(self.config, "init_std"):
            std = self.config.init_std
        elif hasattr(self.config, "initializer_factor"):
            std = self.config.initializer_factor
        else:
            # 0.02 is the standard default value across the library
            std = getattr(self.config.get_text_config(), "initializer_range", 0.02)

        if isinstance(module, (nn.Linear, nn.Conv1d, nn.Conv2d, nn.Conv3d, nn.ConvTranspose1d, nn.ConvTranspose2d)):
            if getattr(module, "weight", None) is not None:
                init.normal_(module.weight, mean=0.0, std=std)
            if getattr(module, "bias", None) is not None:
                init.zeros_(module.bias)
        elif isinstance(module, nn.Embedding):
            if getattr(module, "weight", None) is not None:
                init.normal_(module.weight, mean=0.0, std=std)
                # Here we need the check explicitly, as we slice the weight in the `zeros_` call, so it looses the flag
                if module.padding_idx is not None and not getattr(module.weight, "_is_hf_initialized", False):
                    init.zeros_(module.weight[module.padding_idx])
        elif isinstance(module, nn.MultiheadAttention):
            # This uses torch's original init
            module._reset_parameters()
        # We cannot use `isinstance` on the RMSNorms or LayerNorms, as they usually are custom modules which change names
        # between modelings (because they are prefixed with the model name)
        elif (
            isinstance(module, (nn.GroupNorm, nn.BatchNorm1d, nn.BatchNorm2d

Update dependency transformers to v5

35f5081

dev-mend-for-github-com Bot added the security fix Security fix generated by Mend label May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency transformers to v5#5

Update dependency transformers to v5#5
dev-mend-for-github-com[bot] wants to merge 1 commit into
mainfrom
whitesource-remediate/transformers-5.x

dev-mend-for-github-com Bot commented May 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dev-mend-for-github-com Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

v5.0.0rc3: Release candidate v5.0.0rc3

Release candidate v5.0.0rc3

New models:

What's Changed

New Contributors

v5.0.0rc2: Release candidate 5.0.0rc2

What's Changed

MoEs and performances:

Tokenization:

Core

New models

Quantization

Breaking changes

New Contributors

v5.0.0rc1: Release candidate 5.0.0rc1

What's Changed

Dynamic weight loader updates:

New models:

Some notable quantization fixes:

Peft:

Misc

New Contributors

v5.0.0rc0: Transformers v5.0.0rc0

Transformers v5 release notes

Highlights

Significant API changes

Dynamic weight loading

Tokenization

Backend Architecture Changes: moving away from the slow/fast tokenizer separation

Defining a tokenizers outside of the existing backends

API Changes

1. Direct tokenizer initialization with vocab and merges

2. Simplified decoding API

3. Unified encoding API

4. apply_chat_template returns BatchEncoding

5. Removed legacy configuration file saving:

6. Model-Specific Changes

Testing Changes

7. Deprecated / Modified Features

Disclaimers for the RC0

PEFT + MoE:

Tensor parallel and Expert parallel + MoE

Custom pretrained models:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

dev-mend-for-github-com Bot commented May 2, 2026 •

edited

Loading

`v5.0.0rc3`: Release candidate v5.0.0rc3

`v5.0.0rc2`: Release candidate 5.0.0rc2

`v5.0.0rc1`: Release candidate 5.0.0rc1

`v5.0.0rc0`: Transformers v5.0.0rc0

4. `apply_chat_template` returns `BatchEncoding`