feat: migrate from Granite 3 to Granite 4 hybrid models by planetf1 · Pull Request #357 · generative-computing/mellea

planetf1 · 2026-01-26T12:05:48Z

Migrate from Granite 3.x to Granite 4.0 Models

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Closes feat: Update tests & examples to use granite4 #344

Summary

This PR migrates Mellea from Granite 3.x to Granite 4.0 hybrid models across all backends, tests, and documentation. Note: HuggingFace tests remain on Granite 3.3 due to adapter availability constraints (see below).

Changes

Model Definitions (`mellea/backends/model_ids.py`)

Added Granite 4 hybrid model identifiers:
- IBM_GRANITE_4_HYBRID_MICRO (granite-4.0-h-micro)
- IBM_GRANITE_4_HYBRID_TINY (granite-4.0-h-tiny)
- IBM_GRANITE_4_HYBRID_SMALL (granite-4-h-small)
Restored IBM_GRANITE_4_MICRO_3B with per-backend model selection (Ollama: MICRO, Watsonx: SMALL)
Marked Granite 3 models as deprecated (kept for backward compatibility)
Added vision model: IBM_GRANITE_3_3_VISION_2B

Backend Updates

WatsonxAIBackend: Default model → IBM_GRANITE_4_HYBRID_SMALL
All other backends: Use Granite 4 hybrid models in tests

Test Updates (19 files)

✅ Migrated to Granite 4:

test/backends/test_watsonx.py
test/backends/test_ollama.py
test/backends/test_litellm_*.py (3 files)
test/backends/test_vllm*.py (2 files)
test/stdlib/components/*.py (8 files)
test/stdlib/requirements/*.py (3 files)

⚠️ Remains on Granite 3.3:

test/backends/test_huggingface.py - See "HuggingFace Test Exception" below

⚠️ Remains on Granite 3.2:

test/backends/test_vision_ollama.py - See "Vision Model Exception" below

Documentation Updates

docs/tutorial.md: Updated all examples to Granite 4
docs/alora.md: Updated training examples, added note about non-hybrid models for adapter training
docs/examples/*.py: Updated all example scripts

Test Infrastructure

Removed 48GB memory markers (Granite 4 micro models require ~16GB)
Fixed CI memory constraints by using MICRO models for Ollama tests
Restored per-backend model selection for IBM_GRANITE_4_MICRO_3B (matches upstream pattern)

HuggingFace Test Exception

HuggingFace tests remain on Granite 3.3 due to missing aLoRA adapters for Granite 4.

The HF tests require the requirement_check intrinsic adapter, which is only available for Granite 3.x models in ibm-granite/rag-intrinsics-lib. While ibm-granite/granite-lib-rag-r1.0 has Granite 4 support for RAG intrinsics (answerability, context_relevance, etc.), the core intrinsics needed for tests are not yet available.

Follow-up Issue: #359 tracks migration once Granite 4 adapters are released.

Vision Model Exception

Vision tests remain on granite3.2-vision due to Ollama compatibility issues.

The ibm/granite3.3-vision:2b model causes Ollama server crashes with segmentation fault (null pointer dereference in llama runner). Reverted to granite3.2-vision which works reliably.

Follow-up Issue: #360 documents the crash with full stack traces and debugging information.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Local Testing

# Fast tests (skip LLM quality checks)
uv run pytest -m "not qualitative"

# Full test suite
uv run pytest

Test Results: 204 passed, 6 skipped, 69 deselected, 1 xpassed

CI Testing

All tests pass in CI with CICD=1 (skips qualitative markers).

Related Issues

Closes feat: Update tests & examples to use granite4 #344 (Granite 4 migration)
Follow-up Migrate HuggingFace Tests to Granite 4 When Adapters Available #359 (HuggingFace Granite 4 adapter migration)
Follow-up Investigate granite-vision-3.3-2b Ollama Compatibility Issue #360 (Vision model Ollama crash investigation)

github-actions · 2026-01-26T12:06:01Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

mergify · 2026-01-26T12:06:25Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

planetf1 · 2026-01-26T12:47:23Z

Issue with HuggingFace tests

I moved to using granite v4. First hybrid, then regular. However the intrinsics repo doesn't have granite4 options yet.
Checking status of intrinsics/aLora (including with hybrid - extra parameters for mamba).

planetf1 · 2026-01-26T15:57:37Z

Looking at CI failures...

planetf1 · 2026-01-30T14:35:22Z

Rebased onto upstream/main and squashed to single commit (bc476b9).

Resolved conflicts - added pytest markers from upstream while preserving model selections.

Note: docs/examples/aLora/101_example.py has pre-existing bug (broken since Nov 2025 refactor, commit 1229206). Separate fix needed.

jakelorocco

Looks good! Lets make sure all the intrinsics / adapters tests still run (let me know if you need help testing those). Those tests won't run during the github cicd, so we will need to run them manually.

docs/examples/intrinsics/intrinsics.py

jakelorocco · 2026-02-02T14:02:45Z

test/backends/test_openai_vllm/test_openai_vllm.py

Were you able to test if this (and the other) intrinsic / adapter tests still worked? I think in this case at least, there's no requirement_check adapter trained for this model?

I needed #397 to reliably run the tests in a suitable environment (useful when this is merged).

Temporarily I've cherry-picked the commit here & was able to run all the hugging face tests that are active (examples & tests). This is require reverting some of the tests to the granite3.x models -- as some were dependent on adapters that are not yet available. Issue #359 was already open to track that update.

I've also reverted this change. (don't yet have vllm setup - another todo)

GRANITE4_ADAPTER_STATUS.md

planetf1 · 2026-02-03T22:50:47Z

Will look at this tomorrow as I now have a suitable environment

planetf1 · 2026-02-04T14:08:11Z

Suggestion: If we can agree/merge #397 I will then rebase this PR, resolve conflicts, and rerun the full suite locally + hugging face remotely.

jakelorocco · 2026-02-04T14:31:59Z

Suggestion: If we can agree/merge #397 I will then rebase this PR, resolve conflicts, and rerun the full suite locally + hugging face remotely.

I've approved it!

planetf1 · 2026-02-05T17:47:47Z

Thanks @jakelorocco for the approval.

Rebased, so will need a new review. I can run tests again if needed (ideally after #416 is merged

planetf1 · 2026-02-05T18:04:07Z

Test failing on:

FAILED test/backends/test_litellm_ollama.py::test_async_avalue - litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Ollama_chatException - {"error":"model 'granite4:micro-h' not found"}
= 5 failed, 177 passed, 103 skipped, 1 xpassed, 8 warnings in 518.46s (0:08:38) =

will investigate tomorrow.

jakelorocco · 2026-02-05T18:11:39Z

Thanks @jakelorocco for the approval.

Rebased, so will need a new review. I can run tests again if needed (ideally after #416 is merged

Added both to my review list. Will look today or morning of tomorrow. Thank you!

jakelorocco · 2026-02-06T13:40:11Z

Test failing on:

FAILED test/backends/test_litellm_ollama.py::test_async_avalue - litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Ollama_chatException - {"error":"model 'granite4:micro-h' not found"}
= 5 failed, 177 passed, 103 skipped, 1 xpassed, 8 warnings in 518.46s (0:08:38) =

will investigate tomorrow.

@planetf1, when running ollama through litellm, you have to make sure the model has been pulled first. Litellm won't do that automatically. When I tried it that way, I was able to get the test to run. I'm not certain why it only failed for that test though.

jakelorocco

lgtm assuming the remaining test issue is sorted

- Update to Granite 4 hybrid models where possible (non-intrinsic tests) - Update watsonx backend to use IBM_GRANITE_4_HYBRID_SMALL as default - Add note in alora.md: use non-hybrid models for adapter training - Remove heavy_ram marker from tests using 3B models (only needed for 8B+) - Update model_ids.py with Granite 4 model mappings and deprecation handling test: add Ollama markers and improve test documentation - Add @pytest.mark.ollama to tests requiring Ollama backend - Update test/README.md with comprehensive marker documentation - Update .gitignore for logs/ and pytest output files test: revert intrinsics test to upstream/main model (granite-4.0-micro) Reverting to match upstream/main to verify if granite-4.0-micro works. Previous commit used granite-3.3-8b-instruct based on assumption that Granite 4 adapters don't exist, but PR generative-computing#397 suggests granite-4.0-micro may work. Testing to confirm. fix: revert intrinsics examples to granite-4.0-micro (matching upstream/main) All intrinsics examples were incorrectly changed to granite-3.3-2b-instruct in commit 3b86b9e, but adapters don't exist for the 2B model. Reverting to granite-4.0-micro which has adapters in ibm-granite/granite-lib-rag-r1.0. This matches upstream/main and allows all intrinsics examples to run successfully. fix: revert intrinsics.py to granite-3.3-8b-instruct (matching upstream/main) The requirement_check adapter only exists for granite-3.3-{2b,8b}-instruct models in ibm-granite/rag-intrinsics-lib, not for granite-4.0-micro. Upstream/main uses granite-3.3-8b-instruct which has the required adapter. fix: revert vLLM test to granite-3.3-8b-instruct (matching upstream/main) The requirement_check adapter only exists for granite-3.3-{2b,8b}-instruct models, not for granite-4.0-h-tiny. Upstream/main uses granite-3.3-8b-instruct which has the required adapter in ibm-granite/rag-intrinsics-lib.

planetf1 · 2026-02-10T13:01:44Z

Hugging face tests mostly work locally, but I am seeing an exception from one which relates to error handling:

FAILED test/backends/test_huggingface.py::test_error_during_generate_with_lock

AttributeError: 'Exception' object has no attribute 'sequences'

During handling of the above exception, another exception occurred:

    async def test_error_during_generate_with_lock(backend) -> None:
        # ... test setup ...
        
        with pytest.raises(Exception, match="Oops!"):
>           await reg_mot.avalue()

test/backends/test_huggingface.py:530: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
mellea/core/base.py:251: in avalue
    await self.astream()
mellea/core/base.py:338: in astream
    await self._post_process(self)
mellea/backends/huggingface.py:921: in post_processing
    output_complete = mot._meta["hf_output"].sequences[0]
E   AttributeError: 'Exception' object has no attribute 'sequences'

test/backends/test_huggingface.py:530: AssertionError
E       AssertionError: Regex pattern did not match.
E         Expected regex: 'Oops!'
E         Actual message: "'Exception' object has no attribute 'sequences'"

= 1 failed, 35 passed, 3 skipped, 346 deselected, 1 xfailed, 11 warnings in 246.67s (0:04:06) =

This looks like an underlying issue (and across all backends), so I opened up #432 to track

planetf1 · 2026-02-10T13:02:20Z

Here's the full test output:

=== STDOUT ===
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /u/jonesn/.conda/envs/mellea/bin/python3
cachedir: .pytest_cache
rootdir: /proj/dmfexp/eiger/users/jonesn/mellea-c
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, asyncio-1.3.0, Faker-40.1.2, timeout-2.4.0, langsmith-0.6.6, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 385 items / 346 deselected / 1 skipped / 39 selected

test/backends/test_huggingface.py::test_adapters PASSED                  [  2%]
test/backends/test_huggingface.py::test_system_prompt PASSED             [  5%]
test/backends/test_huggingface.py::test_constraint_lora_with_requirement PASSED [  7%]
test/backends/test_huggingface.py::test_constraint_lora_override PASSED  [ 10%]
test/backends/test_huggingface.py::test_constraint_lora_override_does_not_override_alora PASSED [ 12%]
test/backends/test_huggingface.py::test_llmaj_req_does_not_use_alora PASSED [ 15%]
test/backends/test_huggingface.py::test_instruct PASSED                  [ 17%]
test/backends/test_huggingface.py::test_multiturn PASSED                 [ 20%]
test/backends/test_huggingface.py::test_chat PASSED                      [ 23%]
test/backends/test_huggingface.py::test_format PASSED                    [ 25%]
test/backends/test_huggingface.py::test_generate_from_raw PASSED         [ 28%]
test/backends/test_huggingface.py::test_generate_from_raw_with_format PASSED [ 30%]
test/backends/test_huggingface.py::test_async_parallel_requests PASSED   [ 33%]
test/backends/test_huggingface.py::test_async_avalue PASSED              [ 35%]
test/backends/test_huggingface.py::test_generate_with_lock PASSED        [ 38%]
test/backends/test_huggingface.py::test_generate_with_lock_does_not_block_when_awaiting_value PASSED [ 41%]
test/backends/test_huggingface.py::test_error_during_generate_with_lock FAILED [ 43%]
test/backends/test_huggingface.py::test_assert_correct_adapters PASSED   [ 46%]
test/backends/test_huggingface_tools.py::test_tool PASSED                [ 48%]
test/stdlib/components/intrinsic/test_rag.py::test_answerability PASSED  [ 51%]
test/stdlib/components/intrinsic/test_rag.py::test_query_rewrite PASSED  [ 53%]
test/stdlib/components/intrinsic/test_rag.py::test_citations PASSED      [ 56%]
test/stdlib/components/intrinsic/test_rag.py::test_context_relevance PASSED [ 58%]
test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection PASSED [ 61%]
test/stdlib/components/intrinsic/test_rag.py::test_answer_relevance PASSED [ 64%]
test/stdlib/components/intrinsic/test_rag.py::test_answer_relevance_classifier PASSED [ 66%]
test/stdlib/components/intrinsic/test_rag.py::test_query_clarification_positive PASSED [ 69%]
test/stdlib/components/intrinsic/test_rag.py::test_query_clarification_negative PASSED [ 71%]
test/stdlib/test_spans.py::test_lazy_spans PASSED                        [ 74%]
test/stdlib/test_spans.py::test_kv XFAIL (Model safety refusal despi...) [ 76%]
docs/examples/aLora/101_example.py::101_example.py SKIPPED (uncondit...) [ 79%]
docs/examples/intrinsics/answer_relevance.py::answer_relevance.py PASSED [ 82%]
docs/examples/intrinsics/answerability.py::answerability.py PASSED       [ 84%]
docs/examples/intrinsics/citations.py::citations.py PASSED               [ 87%]
docs/examples/intrinsics/context_relevance.py::context_relevance.py PASSED [ 89%]
docs/examples/intrinsics/hallucination_detection.py::hallucination_detection.py PASSED [ 92%]
docs/examples/intrinsics/intrinsics.py::intrinsics.py PASSED             [ 94%]
docs/examples/intrinsics/query_rewrite.py::query_rewrite.py PASSED       [ 97%]
docs/examples/mify/rich_document_advanced.py::rich_document_advanced.py SKIPPED [100%]

=================================== FAILURES ===================================
_____________________ test_error_during_generate_with_lock _____________________

self = ModelOutputThunk()

    async def astream(self) -> str:
        """Returns the ModelOutputThunk's partial value including the next chunk(s). Can be used for both async streaming and async non-streaming.
    
        Returns the value of the ModelOutputThunk if streaming is done.
    
        **Note**: Be careful with calling this function. Only call it from one location at a time. This means you shouldn't pass a ModelOutputThunk to
        multiple coroutines/tasks and call astream from those coroutines/tasks simultaneously. We have considered solutions to this but are waiting until
        we see this error happen in a real use case.
    
        Raises:
            Exception: Propagates any errors from the underlying inference engine api request.
            RuntimeError: If called when the ModelOutputThunk's generate function is not async compatible.
        """
        if self._computed:
            assert self.value is not None  # If computed, the value cannot be None.
            return self.value
    
        if not self._generate_type == GenerateType.ASYNC:
            raise RuntimeError(
                f"Cannot use `ModelOutputThunk.astream()` when the generate function is using `{self._generate_type.name}`"
            )
        # Beginning value
        beginning_length = (
            0 if self._underlying_value is None else len(str(self._underlying_value))
        )  # type: ignore
    
        exception_to_raise = None
        try:
            # Type of the chunk depends on the backend.
            chunks: list[Any | None] = []
            while True:
                try:
                    item = self._async_queue.get_nowait()
                    chunks.append(item)
                except asyncio.QueueEmpty:
                    # We've exhausted the current items in the queue.
                    break
    
            # Make sure we always get the minimum chunk size.
            while len(chunks) <= self._chunk_size:
                if len(chunks) > 0:
                    if chunks[-1] is None or isinstance(chunks[-1], Exception):
                        break  # Hit sentinel value or an error.
                    # We could switch to relying on the `done` / `finish_reason` field of chunks,
                    # but that forces us to know about the chunk type here. Prefer sentinel values
                    # for now.
    
                item = await self._async_queue.get()
                chunks.append(item)
    
            # Process the sentinel value if it's there.
            if chunks[-1] is None:
                chunks.pop()  # Remove the sentinel value.
                self._computed = True
    
                # Shouldn't be needed, but cancel the Tasks this ModelOutputThunk relied on.
                if self._generate is not None:
                    self._generate.cancel()
                if self._generate_extra is not None:
                    # Covers an hf edge case. The task is done generating anything useful but isn't `done` yet.
                    await self._generate_extra
                    self._generate_extra.cancel()
    
                # If ModelOutputThunks get too bulky, we can do additional cleanup here
                # and set fields to None.
    
            elif isinstance(chunks[-1], Exception):
                # Mark as computed so post_process runs in finally block
                self._computed = True
                # Store exception to re-raise after cleanup
                exception_to_raise = chunks[-1]
    
            for chunk in chunks:
                assert self._process is not None
>               await self._process(self, chunk)

mellea/core/base.py:331: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <mellea.backends.huggingface.LocalHFBackend object at 0x14ef36ebe840>
mot = ModelOutputThunk(), chunk = Exception('Oops!')
input_ids = tensor([[49152,  2946, 49153, 39558,   390, 17071,  2821,    44, 30468,   225,
            36,    34,    36,    38,   ... 49153,  7656,     0,   203, 49152,   496, 49153,
          7656,     0,   203, 49152, 17594, 49153]], device='cuda:0')

    async def processing(
        self, mot: ModelOutputThunk, chunk: str | GenerateDecoderOnlyOutput, input_ids
    ):
        """Process the returned chunks or the complete response."""
        if mot._underlying_value is None:
            mot._underlying_value = ""
    
        # Because we use the AsyncTextIteratorStreamer, streaming responses are of type str;
        # and already decoded.
        if isinstance(chunk, str):
            mot._underlying_value += chunk
        else:
            # Otherwise, it's a non-streaming request. Decode it here.
            mot._meta["hf_output"] = chunk
            mot._underlying_value += self._tokenizer.decode(
>               chunk.sequences[0, input_ids.shape[1] :], skip_special_tokens=True
                ^^^^^^^^^^^^^^^
            )
E           AttributeError: 'Exception' object has no attribute 'sequences'

mellea/backends/huggingface.py:896: AttributeError

During handling of the above exception, another exception occurred:

backend = <mellea.backends.huggingface.LocalHFBackend object at 0x14f20d968c50>

    @pytest.mark.qualitative
    async def test_error_during_generate_with_lock(backend) -> None:
        # Create local versions of these objects so that mocking
        # doesn't impact other functions. Don't do this in regular code,
        # the copying is complex.
        b: LocalHFBackend = copy(backend)
        model = copy(b._model)
        b._model = model
        b._model.set_adapter([])
        b._added_adapters = {}
        b._loaded_adapters = {}
        b.add_adapter(
            GraniteCommonAdapter("requirement_check", base_model_name=b.base_model_name)
        )
    
        regular_generate = b._model.generate
    
        def generate_and_raise_exc(*args, **kwargs):
            """Will generate like usual for the intrinsic request. Will fail for the regular generation request."""
            if "max_new_tokens" in kwargs:
                return regular_generate(*args, **kwargs)  # type: ignore
            raise Exception("Oops!")
    
        b._model.generate = Mock(side_effect=generate_and_raise_exc)
        assert not isinstance(backend._model, Mock), (
            "mocking went wrong; backend fixture changed; other tests may fail"
        )
    
        # Set up the inputs.
        ctx = ChatContext().add(Message("user", "hello"))
        act = CBlock("hello")
        req_intrinsic = Intrinsic("requirement_check", {"requirement": "did nothing"})
    
        reg_mot, _ = await b.generate_from_context(act, ctx)
        req_mot, _ = await b.generate_from_context(req_intrinsic, ctx)
    
        with pytest.raises(Exception, match="Oops!"):
>           await reg_mot.avalue()

test/backends/test_huggingface.py:531: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
mellea/core/base.py:251: in avalue
    await self.astream()
mellea/core/base.py:338: in astream
    await self._post_process(self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <mellea.backends.huggingface.LocalHFBackend object at 0x14ef36ebe840>
mot = ModelOutputThunk()
conversation = [{'content': 'hello', 'role': 'user'}, {'content': 'hello', 'role': 'user'}]
_format = None, tool_calls = False, tools = {}, seed = None
input_ids = tensor([[49152,  2946, 49153, 39558,   390, 17071,  2821,    44, 30468,   225,
            36,    34,    36,    38,   ... 49153,  7656,     0,   203, 49152,   496, 49153,
          7656,     0,   203, 49152, 17594, 49153]], device='cuda:0')

    async def post_processing(
        self,
        mot: ModelOutputThunk,
        conversation: list[dict],
        _format: type[BaseModelSubclass] | None,
        tool_calls: bool,
        tools: dict[str, AbstractMelleaTool],
        seed,
        input_ids,
    ):
        """Called when generation is done."""
        if mot._meta.get("hf_output", None) is None:
            if mot._generate_extra is not None:
                full_output = await mot._generate_extra
                assert isinstance(full_output, GenerateDecoderOnlyOutput)
                mot._meta["hf_output"] = full_output
    
        # The ModelOutputThunk must be computed by this point.
        assert mot.value is not None
    
        # Add an entry to the cache for ALora reuse.
        if self._use_caches and mot._meta.get("hf_output", None) is not None:
>           output_complete = mot._meta["hf_output"].sequences[0]
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E           AttributeError: 'Exception' object has no attribute 'sequences'

mellea/backends/huggingface.py:921: AttributeError

During handling of the above exception, another exception occurred:

backend = <mellea.backends.huggingface.LocalHFBackend object at 0x14f20d968c50>

    @pytest.mark.qualitative
    async def test_error_during_generate_with_lock(backend) -> None:
        # Create local versions of these objects so that mocking
        # doesn't impact other functions. Don't do this in regular code,
        # the copying is complex.
        b: LocalHFBackend = copy(backend)
        model = copy(b._model)
        b._model = model
        b._model.set_adapter([])
        b._added_adapters = {}
        b._loaded_adapters = {}
        b.add_adapter(
            GraniteCommonAdapter("requirement_check", base_model_name=b.base_model_name)
        )
    
        regular_generate = b._model.generate
    
        def generate_and_raise_exc(*args, **kwargs):
            """Will generate like usual for the intrinsic request. Will fail for the regular generation request."""
            if "max_new_tokens" in kwargs:
                return regular_generate(*args, **kwargs)  # type: ignore
            raise Exception("Oops!")
    
        b._model.generate = Mock(side_effect=generate_and_raise_exc)
        assert not isinstance(backend._model, Mock), (
            "mocking went wrong; backend fixture changed; other tests may fail"
        )
    
        # Set up the inputs.
        ctx = ChatContext().add(Message("user", "hello"))
        act = CBlock("hello")
        req_intrinsic = Intrinsic("requirement_check", {"requirement": "did nothing"})
    
        reg_mot, _ = await b.generate_from_context(act, ctx)
        req_mot, _ = await b.generate_from_context(req_intrinsic, ctx)
    
>       with pytest.raises(Exception, match="Oops!"):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       AssertionError: Regex pattern did not match.
E         Expected regex: 'Oops!'
E         Actual message: "'Exception' object has no attribute 'sequences'"

test/backends/test_huggingface.py:530: AssertionError
----------------------------- Captured stderr call -----------------------------

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 13400.33it/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]
Fetching 3 files: 100%|██████████| 3/3 [00:00<00:00, 8583.16it/s]
=============================== warnings summary ===============================
docs/examples/conftest.py:191
  /proj/dmfexp/eiger/users/jonesn/mellea-c/docs/examples/conftest.py:191: PytestRemovedIn9Warning: The (path: py.path.local) argument is deprecated, please use (collection_path: pathlib.Path)
  see https://docs.pytest.org/en/latest/deprecations.html#py-path-local-arguments-for-hooks-replaced-with-pathlib-path
    def pytest_ignore_collect(collection_path, path, config):

docs/examples/conftest.py:225
  /proj/dmfexp/eiger/users/jonesn/mellea-c/docs/examples/conftest.py:225: PytestRemovedIn9Warning: The (path: py.path.local) argument is deprecated, please use (module_path: pathlib.Path)
  see https://docs.pytest.org/en/latest/deprecations.html#py-path-local-arguments-for-hooks-replaced-with-pathlib-path
    def pytest_pycollect_makemodule(module_path, path, parent):

test/backends/test_huggingface.py::test_constraint_lora_with_requirement
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

test/backends/test_huggingface.py::test_constraint_lora_with_requirement
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

test/backends/test_huggingface.py::test_generate_with_lock
test/stdlib/components/intrinsic/test_rag.py::test_query_rewrite
test/stdlib/components/intrinsic/test_rag.py::test_citations
test/stdlib/components/intrinsic/test_rag.py::test_context_relevance
test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection
test/stdlib/components/intrinsic/test_rag.py::test_answer_relevance
test/stdlib/components/intrinsic/test_rag.py::test_query_clarification_positive
  /u/jonesn/.conda/envs/mellea/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== Skipped Examples ===============================
Examples with the following names were skipped because they cannot be easily run in the pytest framework; please run them manually:
mcp_example.py
mellea_pdf.py
simple_rag_with_filter.py
__init__.py
client.py
python_decompose_result.py
pii_serve.py
m_decomp_result.py
=========================== short test summary info ============================
FAILED test/backends/test_huggingface.py::test_error_during_generate_with_lock
= 1 failed, 35 passed, 3 skipped, 346 deselected, 1 xfailed, 11 warnings in 246.67s (0:04:06) =

------------------------------------------------------------
Sender: LSF System <lsfadmin@p4-r02-n1>
Subject: Job 453153: <___huggingface_tests__marker_> in cluster <BLUEVELA_LSF> Exited

Job <___huggingface_tests__marker_> was submitted from host <login3> by user <jonesn> in cluster <BLUEVELA_LSF> at Tue Feb 10 12:28:25 2026
Job was executed on host(s) <p4-r02-n1>, in queue <normal>, as user <jonesn> in cluster <BLUEVELA_LSF> at Tue Feb 10 12:28:26 2026
</u/jonesn> was used as the home directory.
</proj/dmfexp/eiger/users/jonesn/mellea-c> was used as the working directory.
Started at Tue Feb 10 12:28:26 2026
Terminated at Tue Feb 10 12:32:41 2026
Results reported at Tue Feb 10 12:32:41 2026

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True && source /opt/share/miniforge/etc/profile.d/conda.sh && conda activate mellea && pytest -m huggingface --no-cov -v
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   2474.00 sec.
    Max Memory :                                 31923 MB
    Average Memory :                             10266.48 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                752
    Run time :                                   262 sec.
    Turnaround time :                            256 sec.

The output (if any) is above this job summary.



PS:

Read file </proj/dmfexp/eiger/users/jonesn/mellea-c/logs/___huggingface_tests__marker__453153.stderr> for stderr output of this job.



=== STDERR ===
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

planetf1 · 2026-02-10T13:05:50Z

The CI error actually exhibits the same error handling problem I raised in 432, albeit the root cause here being a model not found -- fixed.

planetf1 · 2026-02-10T13:23:25Z

CI & cuda runs are clean

jakelorocco

lgtm; agree that the test issue is unrelated to this work

planetf1 force-pushed the feat/issue-344 branch from 8c9543e to d19052b Compare January 26, 2026 12:11

planetf1 marked this pull request as ready for review January 26, 2026 15:43

planetf1 force-pushed the feat/issue-344 branch 2 times, most recently from 1e70336 to 014c09d Compare January 28, 2026 08:44

planetf1 requested review from jakelorocco and nrfulton January 29, 2026 12:43

planetf1 force-pushed the feat/issue-344 branch from 014c09d to bc476b9 Compare January 30, 2026 14:35

planetf1 mentioned this pull request Jan 30, 2026

fix: formatting model_ids for better readability #386

Merged

8 tasks

jakelorocco requested changes Feb 2, 2026

View reviewed changes

planetf1 force-pushed the feat/issue-344 branch from bc476b9 to 3b86b9e Compare February 4, 2026 11:48

planetf1 force-pushed the feat/issue-344 branch from dea5955 to 1a3ce29 Compare February 5, 2026 17:43

planetf1 requested a review from jakelorocco February 5, 2026 17:48

jakelorocco reviewed Feb 6, 2026

View reviewed changes

planetf1 force-pushed the feat/issue-344 branch from 1a3ce29 to 51e86c0 Compare February 10, 2026 12:28

fix: pull granite4:micro-h model in CI

c2d3d43

planetf1 requested a review from jakelorocco February 10, 2026 13:23

jakelorocco approved these changes Feb 10, 2026

View reviewed changes

planetf1 added this pull request to the merge queue Feb 11, 2026

Merged via the queue into generative-computing:main with commit 8f9e18c Feb 11, 2026
4 checks passed

Conversation

planetf1 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Migrate from Granite 3.x to Granite 4.0 Models

Type of PR

Description

Summary

Changes

Model Definitions (mellea/backends/model_ids.py)

Backend Updates

Test Updates (19 files)

Documentation Updates

Test Infrastructure

HuggingFace Test Exception

Vision Model Exception

Testing

Local Testing

CI Testing

Related Issues

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

mergify bot commented Jan 26, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

planetf1 commented Jan 26, 2026

Uh oh!

planetf1 commented Jan 26, 2026

Uh oh!

planetf1 commented Jan 30, 2026

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jakelorocco Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

planetf1 commented Feb 3, 2026

Uh oh!

planetf1 commented Feb 4, 2026

Uh oh!

jakelorocco commented Feb 4, 2026

Uh oh!

planetf1 commented Feb 5, 2026

Uh oh!

planetf1 commented Feb 5, 2026

Uh oh!

jakelorocco commented Feb 5, 2026

Uh oh!

jakelorocco commented Feb 6, 2026

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

planetf1 commented Feb 10, 2026

Uh oh!

planetf1 commented Feb 10, 2026

Uh oh!

planetf1 commented Feb 10, 2026

Uh oh!

planetf1 commented Feb 10, 2026

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

planetf1 commented Jan 26, 2026 •

edited

Loading

Model Definitions (`mellea/backends/model_ids.py`)