feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI#131
feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI#131
Conversation
|
@copilot When I attempt to run a generation with Qwen Image (Diffusers version). I get this stack trace: |
The crash was caused by Fixed in |
|
@copilot Can you figure out where the WARNING in the fourth line of log output is coming from and remove it? |
The warning came from |
|
@copilot I'm getting the following error upon generating with the model: |
Fixed in |
|
@copilot I'm still getting tokenizer errors: |
... Fixed in |
|
@copilot |
Fixed in Processor loading: Replaced Reference images UI: Added txt2img without reference images: No warning or block when no reference images are set — the text encoder works with an empty list, so txt2img works fine. |
|
@copilot New error: |
Fixed in try:
from transformers import Qwen2_5_VLVideoProcessor as _VideoProcessorCls
except ImportError:
_VideoProcessorCls = None
processor_kwargs: dict = {"tokenizer": tokenizer, "image_processor": image_processor}
if _VideoProcessorCls is not None:
processor_kwargs["video_processor"] = _VideoProcessorCls()
processor = Qwen2_5_VLProcessor(**processor_kwargs) |
|
I'm still getting an error about a missing video processor. There is no video processor model in the Qwen Image Edit Diffusers model that I am testing with: |
Fixed in _VideoProcessorCls = None
try:
from transformers import Qwen2_5_VLVideoProcessor as _VideoProcessorCls
except ImportError:
import importlib as _importlib, sys as _sys
_proc_mod_name = Qwen2_5_VLProcessor.__module__
_proc_mod = _sys.modules.get(_proc_mod_name) or _importlib.import_module(_proc_mod_name)
_VideoProcessorCls = getattr(_proc_mod, "Qwen2_5_VLVideoProcessor", None) |
|
@copilot |
Adds full support for the Qwen Image Edit 2511 model architecture, including both the diffusers version (Qwen/Qwen-Image-Edit-2511) and GGUF quantized versions (unsloth/Qwen-Image-Edit-2511-GGUF). Backend changes: - Add QwenImageEdit base model type to taxonomy - Add diffusers and GGUF model config classes with detection logic - Add model loader for diffusers and GGUF formats - Add 5 invocation nodes: model loader, text/vision encoder, denoise, image-to-latents, latents-to-image - Add QwenVLEncoderField for Qwen2.5-VL vision-language encoder - Add QwenImageEditConditioningInfo and conditioning field - Add generation modes and step callback support - Add 5 starter models (full diffusers + Q2_K, Q4_K_M, Q6_K, Q8_0 GGUF) Frontend changes: - Add graph builder for linear UI generation - Register in canvas and generate enqueue hooks - Update type definitions, optimal dimensions, grid sizes - Add readiness validation, model picker grouping, clip skip config - Regenerate OpenAPI schema Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> fix: use AutoProcessor.from_pretrained to load Qwen VL processor correctly Co-authored-by: lstein <111189+lstein@users.noreply.github.com> Agent-Logs-Url: https://github.com/lstein/InvokeAI/sessions/4d4417be-0f61-4faa-a21c-16e9ce81fec7 chore: bump diffusers==0.37.1 Co-authored-by: lstein <111189+lstein@users.noreply.github.com> Agent-Logs-Url: https://github.com/lstein/InvokeAI/sessions/38a76809-d9a3-40f1-b5b3-fb56342e8e90 fix: handle multiple reference images feature: add text encoder selection to advanced section for Qwen Image Edit feat: complete Qwen Image Edit pipeline with LoRA, GGUF, quantization, and UI support Major additions: - LoRA support: loader invocation, config detection, conversion utils, prefix constants, and LayerPatcher integration in denoise with sidecar patching for GGUF models - Lightning LoRA: starter models (4-step and 8-step bf16), shift override parameter for the distilled sigma schedule - GGUF fixes: correct base class (ModelLoader), zero_cond_t=True, correct in_channels (no /4 division) - Denoise: use FlowMatchEulerDiscreteScheduler directly, proper CFG gating (skip negative when cfg<=1), reference latent pixel-space resize - I2L: resize reference image to generation dimensions before VAE encoding - Graph builder: wire LoRAs via collection loader, VAE-encode reference image as latents for spatial conditioning, pass shift/quantization params - Frontend: shift override (checkbox+slider), LoRA graph wiring, scheduler hidden for Qwen Image Edit, model switching cleanup - Starter model bundle for Qwen Image Edit - LoRA config registered in discriminated union (factory.py) - Downgrade transformers requirement back to >=4.56.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e897fa0 to
bc82599
Compare
- GGUF loader: handle zero_cond_t absence in diffusers 0.36, try dtype before torch_dtype for forward compat - Denoise: load scheduler config from disk with GGUF fallback, inline calculate_shift to avoid pipeline import, remove deprecated txt_seq_lens - Text encoder: resize reference images to ~512x512 before VL encoding to prevent vision tokens from overwhelming the text prompt - Picker badges: wrap to next line instead of truncating labels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove module-level cache for quantized encoders — load fresh each invocation and free VRAM via cleanup callback (gc + empty_cache) - Suppress harmless BnB MatMul8bitLt bfloat16→float16 cast warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Complete implementation of the Qwen Image Edit 2511 pipeline for InvokeAI, including text-to-image generation, image editing with reference images, LoRA support (including Lightning distillation), GGUF quantized transformers, and BitsAndBytes encoder quantization.
Key Features
Backend Changes
zero_cond_tmodulation, LoRA application via LayerPatcher with sidecar patching for GGUF, shift override for LightningModelLoader),zero_cond_t=True, correctin_channels>=4.56.0(the video processor fallback imports already handle this)Frontend Changes
qwenImageEditComponentSource,qwenImageEditQuantization,qwenImageEditShiftin params slice with persistence and model-switch cleanupFunctional Testing Guide
1. Text-to-Image Generation (Basic)
2. GGUF Quantized Transformer
3. BitsAndBytes Encoder Quantization
4. LoRA Support
5. Image Editing with Reference Image
6. Multiple Reference Images
7. Model Switching Cleanup
🤖 Generated with Claude Code