Skip to content

feat: migrate from Granite 3 to Granite 4 hybrid models#357

Merged
planetf1 merged 2 commits intogenerative-computing:mainfrom
planetf1:feat/issue-344
Feb 11, 2026
Merged

feat: migrate from Granite 3 to Granite 4 hybrid models#357
planetf1 merged 2 commits intogenerative-computing:mainfrom
planetf1:feat/issue-344

Conversation

@planetf1
Copy link
Contributor

@planetf1 planetf1 commented Jan 26, 2026

Migrate from Granite 3.x to Granite 4.0 Models

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Summary

This PR migrates Mellea from Granite 3.x to Granite 4.0 hybrid models across all backends, tests, and documentation. Note: HuggingFace tests remain on Granite 3.3 due to adapter availability constraints (see below).

Changes

Model Definitions (mellea/backends/model_ids.py)

  • Added Granite 4 hybrid model identifiers:
    • IBM_GRANITE_4_HYBRID_MICRO (granite-4.0-h-micro)
    • IBM_GRANITE_4_HYBRID_TINY (granite-4.0-h-tiny)
    • IBM_GRANITE_4_HYBRID_SMALL (granite-4-h-small)
  • Restored IBM_GRANITE_4_MICRO_3B with per-backend model selection (Ollama: MICRO, Watsonx: SMALL)
  • Marked Granite 3 models as deprecated (kept for backward compatibility)
  • Added vision model: IBM_GRANITE_3_3_VISION_2B

Backend Updates

  • WatsonxAIBackend: Default model → IBM_GRANITE_4_HYBRID_SMALL
  • All other backends: Use Granite 4 hybrid models in tests

Test Updates (19 files)

Migrated to Granite 4:

  • test/backends/test_watsonx.py
  • test/backends/test_ollama.py
  • test/backends/test_litellm_*.py (3 files)
  • test/backends/test_vllm*.py (2 files)
  • test/stdlib/components/*.py (8 files)
  • test/stdlib/requirements/*.py (3 files)

⚠️ Remains on Granite 3.3:

  • test/backends/test_huggingface.py - See "HuggingFace Test Exception" below

⚠️ Remains on Granite 3.2:

  • test/backends/test_vision_ollama.py - See "Vision Model Exception" below

Documentation Updates

  • docs/tutorial.md: Updated all examples to Granite 4
  • docs/alora.md: Updated training examples, added note about non-hybrid models for adapter training
  • docs/examples/*.py: Updated all example scripts

Test Infrastructure

  • Removed 48GB memory markers (Granite 4 micro models require ~16GB)
  • Fixed CI memory constraints by using MICRO models for Ollama tests
  • Restored per-backend model selection for IBM_GRANITE_4_MICRO_3B (matches upstream pattern)

HuggingFace Test Exception

HuggingFace tests remain on Granite 3.3 due to missing aLoRA adapters for Granite 4.

The HF tests require the requirement_check intrinsic adapter, which is only available for Granite 3.x models in ibm-granite/rag-intrinsics-lib. While ibm-granite/granite-lib-rag-r1.0 has Granite 4 support for RAG intrinsics (answerability, context_relevance, etc.), the core intrinsics needed for tests are not yet available.

Follow-up Issue: #359 tracks migration once Granite 4 adapters are released.

Vision Model Exception

Vision tests remain on granite3.2-vision due to Ollama compatibility issues.

The ibm/granite3.3-vision:2b model causes Ollama server crashes with segmentation fault (null pointer dereference in llama runner). Reverted to granite3.2-vision which works reliably.

Follow-up Issue: #360 documents the crash with full stack traces and debugging information.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Local Testing

# Fast tests (skip LLM quality checks)
uv run pytest -m "not qualitative"

# Full test suite
uv run pytest

Test Results: 204 passed, 6 skipped, 69 deselected, 1 xpassed

CI Testing

All tests pass in CI with CICD=1 (skips qualitative markers).

Related Issues

@github-actions
Copy link
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link

mergify bot commented Jan 26, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@planetf1
Copy link
Contributor Author

Issue with HuggingFace tests

  • I moved to using granite v4. First hybrid, then regular. However the intrinsics repo doesn't have granite4 options yet.
    Checking status of intrinsics/aLora (including with hybrid - extra parameters for mamba).

@planetf1 planetf1 marked this pull request as ready for review January 26, 2026 15:43
@planetf1
Copy link
Contributor Author

Looking at CI failures...

@planetf1 planetf1 force-pushed the feat/issue-344 branch 2 times, most recently from 1e70336 to 014c09d Compare January 28, 2026 08:44
@planetf1
Copy link
Contributor Author

Rebased onto upstream/main and squashed to single commit (bc476b9).

Resolved conflicts - added pytest markers from upstream while preserving model selections.

Note: docs/examples/aLora/101_example.py has pre-existing bug (broken since Nov 2025 refactor, commit 1229206). Separate fix needed.

Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Lets make sure all the intrinsics / adapters tests still run (let me know if you need help testing those). Those tests won't run during the github cicd, so we will need to run them manually.

Comment on lines 135 to 144
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to test if this (and the other) intrinsic / adapter tests still worked? I think in this case at least, there's no requirement_check adapter trained for this model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed #397 to reliably run the tests in a suitable environment (useful when this is merged).

Temporarily I've cherry-picked the commit here & was able to run all the hugging face tests that are active (examples & tests). This is require reverting some of the tests to the granite3.x models -- as some were dependent on adapters that are not yet available. Issue #359 was already open to track that update.

I've also reverted this change. (don't yet have vllm setup - another todo)

@planetf1
Copy link
Contributor Author

planetf1 commented Feb 3, 2026

Will look at this tomorrow as I now have a suitable environment

@planetf1
Copy link
Contributor Author

planetf1 commented Feb 4, 2026

Suggestion: If we can agree/merge #397 I will then rebase this PR, resolve conflicts, and rerun the full suite locally + hugging face remotely.

@jakelorocco
Copy link
Contributor

Suggestion: If we can agree/merge #397 I will then rebase this PR, resolve conflicts, and rerun the full suite locally + hugging face remotely.

I've approved it!

@planetf1
Copy link
Contributor Author

planetf1 commented Feb 5, 2026

Thanks @jakelorocco for the approval.

Rebased, so will need a new review. I can run tests again if needed (ideally after #416 is merged

@planetf1 planetf1 requested a review from jakelorocco February 5, 2026 17:48
@planetf1
Copy link
Contributor Author

planetf1 commented Feb 5, 2026

Test failing on:

FAILED test/backends/test_litellm_ollama.py::test_async_avalue - litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Ollama_chatException - {"error":"model 'granite4:micro-h' not found"}
= 5 failed, 177 passed, 103 skipped, 1 xpassed, 8 warnings in 518.46s (0:08:38) =

will investigate tomorrow.

@jakelorocco
Copy link
Contributor

Thanks @jakelorocco for the approval.

Rebased, so will need a new review. I can run tests again if needed (ideally after #416 is merged

Added both to my review list. Will look today or morning of tomorrow. Thank you!

@jakelorocco
Copy link
Contributor

Test failing on:

FAILED test/backends/test_litellm_ollama.py::test_async_avalue - litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Ollama_chatException - {"error":"model 'granite4:micro-h' not found"}
= 5 failed, 177 passed, 103 skipped, 1 xpassed, 8 warnings in 518.46s (0:08:38) =

will investigate tomorrow.

@planetf1, when running ollama through litellm, you have to make sure the model has been pulled first. Litellm won't do that automatically. When I tried it that way, I was able to get the test to run. I'm not certain why it only failed for that test though.

Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm assuming the remaining test issue is sorted

- Update to Granite 4 hybrid models where possible (non-intrinsic tests)
- Update watsonx backend to use IBM_GRANITE_4_HYBRID_SMALL as default
- Add note in alora.md: use non-hybrid models for adapter training
- Remove heavy_ram marker from tests using 3B models (only needed for 8B+)
- Update model_ids.py with Granite 4 model mappings and deprecation handling

test: add Ollama markers and improve test documentation

- Add @pytest.mark.ollama to tests requiring Ollama backend
- Update test/README.md with comprehensive marker documentation
- Update .gitignore for logs/ and pytest output files

test: revert intrinsics test to upstream/main model (granite-4.0-micro)

Reverting to match upstream/main to verify if granite-4.0-micro works.
Previous commit used granite-3.3-8b-instruct based on assumption that
Granite 4 adapters don't exist, but PR generative-computing#397 suggests granite-4.0-micro
may work. Testing to confirm.

fix: revert intrinsics examples to granite-4.0-micro (matching upstream/main)

All intrinsics examples were incorrectly changed to granite-3.3-2b-instruct
in commit 3b86b9e, but adapters don't exist for the 2B model. Reverting to
granite-4.0-micro which has adapters in ibm-granite/granite-lib-rag-r1.0.

This matches upstream/main and allows all intrinsics examples to run successfully.

fix: revert intrinsics.py to granite-3.3-8b-instruct (matching upstream/main)

The requirement_check adapter only exists for granite-3.3-{2b,8b}-instruct
models in ibm-granite/rag-intrinsics-lib, not for granite-4.0-micro.

Upstream/main uses granite-3.3-8b-instruct which has the required adapter.

fix: revert vLLM test to granite-3.3-8b-instruct (matching upstream/main)

The requirement_check adapter only exists for granite-3.3-{2b,8b}-instruct
models, not for granite-4.0-h-tiny. Upstream/main uses granite-3.3-8b-instruct
which has the required adapter in ibm-granite/rag-intrinsics-lib.
@planetf1
Copy link
Contributor Author

Hugging face tests mostly work locally, but I am seeing an exception from one which relates to error handling:

FAILED test/backends/test_huggingface.py::test_error_during_generate_with_lock

AttributeError: 'Exception' object has no attribute 'sequences'

During handling of the above exception, another exception occurred:

    async def test_error_during_generate_with_lock(backend) -> None:
        # ... test setup ...
        
        with pytest.raises(Exception, match="Oops!"):
>           await reg_mot.avalue()

test/backends/test_huggingface.py:530: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
mellea/core/base.py:251: in avalue
    await self.astream()
mellea/core/base.py:338: in astream
    await self._post_process(self)
mellea/backends/huggingface.py:921: in post_processing
    output_complete = mot._meta["hf_output"].sequences[0]
E   AttributeError: 'Exception' object has no attribute 'sequences'

test/backends/test_huggingface.py:530: AssertionError
E       AssertionError: Regex pattern did not match.
E         Expected regex: 'Oops!'
E         Actual message: "'Exception' object has no attribute 'sequences'"

= 1 failed, 35 passed, 3 skipped, 346 deselected, 1 xfailed, 11 warnings in 246.67s (0:04:06) =

This looks like an underlying issue (and across all backends), so I opened up #432 to track

@planetf1
Copy link
Contributor Author

Here's the full test output:

=== STDOUT ===
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /u/jonesn/.conda/envs/mellea/bin/python3
cachedir: .pytest_cache
rootdir: /proj/dmfexp/eiger/users/jonesn/mellea-c
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, asyncio-1.3.0, Faker-40.1.2, timeout-2.4.0, langsmith-0.6.6, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 385 items / 346 deselected / 1 skipped / 39 selected

test/backends/test_huggingface.py::test_adapters PASSED                  [  2%]
test/backends/test_huggingface.py::test_system_prompt PASSED             [  5%]
test/backends/test_huggingface.py::test_constraint_lora_with_requirement PASSED [  7%]
test/backends/test_huggingface.py::test_constraint_lora_override PASSED  [ 10%]
test/backends/test_huggingface.py::test_constraint_lora_override_does_not_override_alora PASSED [ 12%]
test/backends/test_huggingface.py::test_llmaj_req_does_not_use_alora PASSED [ 15%]
test/backends/test_huggingface.py::test_instruct PASSED                  [ 17%]
test/backends/test_huggingface.py::test_multiturn PASSED                 [ 20%]
test/backends/test_huggingface.py::test_chat PASSED                      [ 23%]
test/backends/test_huggingface.py::test_format PASSED                    [ 25%]
test/backends/test_huggingface.py::test_generate_from_raw PASSED         [ 28%]
test/backends/test_huggingface.py::test_generate_from_raw_with_format PASSED [ 30%]
test/backends/test_huggingface.py::test_async_parallel_requests PASSED   [ 33%]
test/backends/test_huggingface.py::test_async_avalue PASSED              [ 35%]
test/backends/test_huggingface.py::test_generate_with_lock PASSED        [ 38%]
test/backends/test_huggingface.py::test_generate_with_lock_does_not_block_when_awaiting_value PASSED [ 41%]
test/backends/test_huggingface.py::test_error_during_generate_with_lock FAILED [ 43%]
test/backends/test_huggingface.py::test_assert_correct_adapters PASSED   [ 46%]
test/backends/test_huggingface_tools.py::test_tool PASSED                [ 48%]
test/stdlib/components/intrinsic/test_rag.py::test_answerability PASSED  [ 51%]
test/stdlib/components/intrinsic/test_rag.py::test_query_rewrite PASSED  [ 53%]
test/stdlib/components/intrinsic/test_rag.py::test_citations PASSED      [ 56%]
test/stdlib/components/intrinsic/test_rag.py::test_context_relevance PASSED [ 58%]
test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection PASSED [ 61%]
test/stdlib/components/intrinsic/test_rag.py::test_answer_relevance PASSED [ 64%]
test/stdlib/components/intrinsic/test_rag.py::test_answer_relevance_classifier PASSED [ 66%]
test/stdlib/components/intrinsic/test_rag.py::test_query_clarification_positive PASSED [ 69%]
test/stdlib/components/intrinsic/test_rag.py::test_query_clarification_negative PASSED [ 71%]
test/stdlib/test_spans.py::test_lazy_spans PASSED                        [ 74%]
test/stdlib/test_spans.py::test_kv XFAIL (Model safety refusal despi...) [ 76%]
docs/examples/aLora/101_example.py::101_example.py SKIPPED (uncondit...) [ 79%]
docs/examples/intrinsics/answer_relevance.py::answer_relevance.py PASSED [ 82%]
docs/examples/intrinsics/answerability.py::answerability.py PASSED       [ 84%]
docs/examples/intrinsics/citations.py::citations.py PASSED               [ 87%]
docs/examples/intrinsics/context_relevance.py::context_relevance.py PASSED [ 89%]
docs/examples/intrinsics/hallucination_detection.py::hallucination_detection.py PASSED [ 92%]
docs/examples/intrinsics/intrinsics.py::intrinsics.py PASSED             [ 94%]
docs/examples/intrinsics/query_rewrite.py::query_rewrite.py PASSED       [ 97%]
docs/examples/mify/rich_document_advanced.py::rich_document_advanced.py SKIPPED [100%]

=================================== FAILURES ===================================
_____________________ test_error_during_generate_with_lock _____________________

self = ModelOutputThunk()

    async def astream(self) -> str:
        """Returns the ModelOutputThunk's partial value including the next chunk(s). Can be used for both async streaming and async non-streaming.
    
        Returns the value of the ModelOutputThunk if streaming is done.
    
        **Note**: Be careful with calling this function. Only call it from one location at a time. This means you shouldn't pass a ModelOutputThunk to
        multiple coroutines/tasks and call astream from those coroutines/tasks simultaneously. We have considered solutions to this but are waiting until
        we see this error happen in a real use case.
    
        Raises:
            Exception: Propagates any errors from the underlying inference engine api request.
            RuntimeError: If called when the ModelOutputThunk's generate function is not async compatible.
        """
        if self._computed:
            assert self.value is not None  # If computed, the value cannot be None.
            return self.value
    
        if not self._generate_type == GenerateType.ASYNC:
            raise RuntimeError(
                f"Cannot use `ModelOutputThunk.astream()` when the generate function is using `{self._generate_type.name}`"
            )
        # Beginning value
        beginning_length = (
            0 if self._underlying_value is None else len(str(self._underlying_value))
        )  # type: ignore
    
        exception_to_raise = None
        try:
            # Type of the chunk depends on the backend.
            chunks: list[Any | None] = []
            while True:
                try:
                    item = self._async_queue.get_nowait()
                    chunks.append(item)
                except asyncio.QueueEmpty:
                    # We've exhausted the current items in the queue.
                    break
    
            # Make sure we always get the minimum chunk size.
            while len(chunks) <= self._chunk_size:
                if len(chunks) > 0:
                    if chunks[-1] is None or isinstance(chunks[-1], Exception):
                        break  # Hit sentinel value or an error.
                    # We could switch to relying on the `done` / `finish_reason` field of chunks,
                    # but that forces us to know about the chunk type here. Prefer sentinel values
                    # for now.
    
                item = await self._async_queue.get()
                chunks.append(item)
    
            # Process the sentinel value if it's there.
            if chunks[-1] is None:
                chunks.pop()  # Remove the sentinel value.
                self._computed = True
    
                # Shouldn't be needed, but cancel the Tasks this ModelOutputThunk relied on.
                if self._generate is not None:
                    self._generate.cancel()
                if self._generate_extra is not None:
                    # Covers an hf edge case. The task is done generating anything useful but isn't `done` yet.
                    await self._generate_extra
                    self._generate_extra.cancel()
    
                # If ModelOutputThunks get too bulky, we can do additional cleanup here
                # and set fields to None.
    
            elif isinstance(chunks[-1], Exception):
                # Mark as computed so post_process runs in finally block
                self._computed = True
                # Store exception to re-raise after cleanup
                exception_to_raise = chunks[-1]
    
            for chunk in chunks:
                assert self._process is not None
>               await self._process(self, chunk)

mellea/core/base.py:331: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <mellea.backends.huggingface.LocalHFBackend object at 0x14ef36ebe840>
mot = ModelOutputThunk(), chunk = Exception('Oops!')
input_ids = tensor([[49152,  2946, 49153, 39558,   390, 17071,  2821,    44, 30468,   225,
            36,    34,    36,    38,   ... 49153,  7656,     0,   203, 49152,   496, 49153,
          7656,     0,   203, 49152, 17594, 49153]], device='cuda:0')

    async def processing(
        self, mot: ModelOutputThunk, chunk: str | GenerateDecoderOnlyOutput, input_ids
    ):
        """Process the returned chunks or the complete response."""
        if mot._underlying_value is None:
            mot._underlying_value = ""
    
        # Because we use the AsyncTextIteratorStreamer, streaming responses are of type str;
        # and already decoded.
        if isinstance(chunk, str):
            mot._underlying_value += chunk
        else:
            # Otherwise, it's a non-streaming request. Decode it here.
            mot._meta["hf_output"] = chunk
            mot._underlying_value += self._tokenizer.decode(
>               chunk.sequences[0, input_ids.shape[1] :], skip_special_tokens=True
                ^^^^^^^^^^^^^^^
            )
E           AttributeError: 'Exception' object has no attribute 'sequences'

mellea/backends/huggingface.py:896: AttributeError

During handling of the above exception, another exception occurred:

backend = <mellea.backends.huggingface.LocalHFBackend object at 0x14f20d968c50>

    @pytest.mark.qualitative
    async def test_error_during_generate_with_lock(backend) -> None:
        # Create local versions of these objects so that mocking
        # doesn't impact other functions. Don't do this in regular code,
        # the copying is complex.
        b: LocalHFBackend = copy(backend)
        model = copy(b._model)
        b._model = model
        b._model.set_adapter([])
        b._added_adapters = {}
        b._loaded_adapters = {}
        b.add_adapter(
            GraniteCommonAdapter("requirement_check", base_model_name=b.base_model_name)
        )
    
        regular_generate = b._model.generate
    
        def generate_and_raise_exc(*args, **kwargs):
            """Will generate like usual for the intrinsic request. Will fail for the regular generation request."""
            if "max_new_tokens" in kwargs:
                return regular_generate(*args, **kwargs)  # type: ignore
            raise Exception("Oops!")
    
        b._model.generate = Mock(side_effect=generate_and_raise_exc)
        assert not isinstance(backend._model, Mock), (
            "mocking went wrong; backend fixture changed; other tests may fail"
        )
    
        # Set up the inputs.
        ctx = ChatContext().add(Message("user", "hello"))
        act = CBlock("hello")
        req_intrinsic = Intrinsic("requirement_check", {"requirement": "did nothing"})
    
        reg_mot, _ = await b.generate_from_context(act, ctx)
        req_mot, _ = await b.generate_from_context(req_intrinsic, ctx)
    
        with pytest.raises(Exception, match="Oops!"):
>           await reg_mot.avalue()

test/backends/test_huggingface.py:531: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
mellea/core/base.py:251: in avalue
    await self.astream()
mellea/core/base.py:338: in astream
    await self._post_process(self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <mellea.backends.huggingface.LocalHFBackend object at 0x14ef36ebe840>
mot = ModelOutputThunk()
conversation = [{'content': 'hello', 'role': 'user'}, {'content': 'hello', 'role': 'user'}]
_format = None, tool_calls = False, tools = {}, seed = None
input_ids = tensor([[49152,  2946, 49153, 39558,   390, 17071,  2821,    44, 30468,   225,
            36,    34,    36,    38,   ... 49153,  7656,     0,   203, 49152,   496, 49153,
          7656,     0,   203, 49152, 17594, 49153]], device='cuda:0')

    async def post_processing(
        self,
        mot: ModelOutputThunk,
        conversation: list[dict],
        _format: type[BaseModelSubclass] | None,
        tool_calls: bool,
        tools: dict[str, AbstractMelleaTool],
        seed,
        input_ids,
    ):
        """Called when generation is done."""
        if mot._meta.get("hf_output", None) is None:
            if mot._generate_extra is not None:
                full_output = await mot._generate_extra
                assert isinstance(full_output, GenerateDecoderOnlyOutput)
                mot._meta["hf_output"] = full_output
    
        # The ModelOutputThunk must be computed by this point.
        assert mot.value is not None
    
        # Add an entry to the cache for ALora reuse.
        if self._use_caches and mot._meta.get("hf_output", None) is not None:
>           output_complete = mot._meta["hf_output"].sequences[0]
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E           AttributeError: 'Exception' object has no attribute 'sequences'

mellea/backends/huggingface.py:921: AttributeError

During handling of the above exception, another exception occurred:

backend = <mellea.backends.huggingface.LocalHFBackend object at 0x14f20d968c50>

    @pytest.mark.qualitative
    async def test_error_during_generate_with_lock(backend) -> None:
        # Create local versions of these objects so that mocking
        # doesn't impact other functions. Don't do this in regular code,
        # the copying is complex.
        b: LocalHFBackend = copy(backend)
        model = copy(b._model)
        b._model = model
        b._model.set_adapter([])
        b._added_adapters = {}
        b._loaded_adapters = {}
        b.add_adapter(
            GraniteCommonAdapter("requirement_check", base_model_name=b.base_model_name)
        )
    
        regular_generate = b._model.generate
    
        def generate_and_raise_exc(*args, **kwargs):
            """Will generate like usual for the intrinsic request. Will fail for the regular generation request."""
            if "max_new_tokens" in kwargs:
                return regular_generate(*args, **kwargs)  # type: ignore
            raise Exception("Oops!")
    
        b._model.generate = Mock(side_effect=generate_and_raise_exc)
        assert not isinstance(backend._model, Mock), (
            "mocking went wrong; backend fixture changed; other tests may fail"
        )
    
        # Set up the inputs.
        ctx = ChatContext().add(Message("user", "hello"))
        act = CBlock("hello")
        req_intrinsic = Intrinsic("requirement_check", {"requirement": "did nothing"})
    
        reg_mot, _ = await b.generate_from_context(act, ctx)
        req_mot, _ = await b.generate_from_context(req_intrinsic, ctx)
    
>       with pytest.raises(Exception, match="Oops!"):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       AssertionError: Regex pattern did not match.
E         Expected regex: 'Oops!'
E         Actual message: "'Exception' object has no attribute 'sequences'"

test/backends/test_huggingface.py:530: AssertionError
----------------------------- Captured stderr call -----------------------------

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 13400.33it/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]
Fetching 3 files: 100%|██████████| 3/3 [00:00<00:00, 8583.16it/s]
=============================== warnings summary ===============================
docs/examples/conftest.py:191
  /proj/dmfexp/eiger/users/jonesn/mellea-c/docs/examples/conftest.py:191: PytestRemovedIn9Warning: The (path: py.path.local) argument is deprecated, please use (collection_path: pathlib.Path)
  see https://docs.pytest.org/en/latest/deprecations.html#py-path-local-arguments-for-hooks-replaced-with-pathlib-path
    def pytest_ignore_collect(collection_path, path, config):

docs/examples/conftest.py:225
  /proj/dmfexp/eiger/users/jonesn/mellea-c/docs/examples/conftest.py:225: PytestRemovedIn9Warning: The (path: py.path.local) argument is deprecated, please use (module_path: pathlib.Path)
  see https://docs.pytest.org/en/latest/deprecations.html#py-path-local-arguments-for-hooks-replaced-with-pathlib-path
    def pytest_pycollect_makemodule(module_path, path, parent):

test/backends/test_huggingface.py::test_constraint_lora_with_requirement
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

test/backends/test_huggingface.py::test_constraint_lora_with_requirement
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

test/backends/test_huggingface.py::test_generate_with_lock
test/stdlib/components/intrinsic/test_rag.py::test_query_rewrite
test/stdlib/components/intrinsic/test_rag.py::test_citations
test/stdlib/components/intrinsic/test_rag.py::test_context_relevance
test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection
test/stdlib/components/intrinsic/test_rag.py::test_answer_relevance
test/stdlib/components/intrinsic/test_rag.py::test_query_clarification_positive
  /u/jonesn/.conda/envs/mellea/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== Skipped Examples ===============================
Examples with the following names were skipped because they cannot be easily run in the pytest framework; please run them manually:
mcp_example.py
mellea_pdf.py
simple_rag_with_filter.py
__init__.py
client.py
python_decompose_result.py
pii_serve.py
m_decomp_result.py
=========================== short test summary info ============================
FAILED test/backends/test_huggingface.py::test_error_during_generate_with_lock
= 1 failed, 35 passed, 3 skipped, 346 deselected, 1 xfailed, 11 warnings in 246.67s (0:04:06) =

------------------------------------------------------------
Sender: LSF System <lsfadmin@p4-r02-n1>
Subject: Job 453153: <___huggingface_tests__marker_> in cluster <BLUEVELA_LSF> Exited

Job <___huggingface_tests__marker_> was submitted from host <login3> by user <jonesn> in cluster <BLUEVELA_LSF> at Tue Feb 10 12:28:25 2026
Job was executed on host(s) <p4-r02-n1>, in queue <normal>, as user <jonesn> in cluster <BLUEVELA_LSF> at Tue Feb 10 12:28:26 2026
</u/jonesn> was used as the home directory.
</proj/dmfexp/eiger/users/jonesn/mellea-c> was used as the working directory.
Started at Tue Feb 10 12:28:26 2026
Terminated at Tue Feb 10 12:32:41 2026
Results reported at Tue Feb 10 12:32:41 2026

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True && source /opt/share/miniforge/etc/profile.d/conda.sh && conda activate mellea && pytest -m huggingface --no-cov -v
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   2474.00 sec.
    Max Memory :                                 31923 MB
    Average Memory :                             10266.48 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                752
    Run time :                                   262 sec.
    Turnaround time :                            256 sec.

The output (if any) is above this job summary.



PS:

Read file </proj/dmfexp/eiger/users/jonesn/mellea-c/logs/___huggingface_tests__marker__453153.stderr> for stderr output of this job.



=== STDERR ===
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute


@planetf1
Copy link
Contributor Author

The CI error actually exhibits the same error handling problem I raised in 432, albeit the root cause here being a model not found -- fixed.

@planetf1
Copy link
Contributor Author

CI & cuda runs are clean

@planetf1 planetf1 requested a review from jakelorocco February 10, 2026 13:23
Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; agree that the test issue is unrelated to this work

@planetf1 planetf1 added this pull request to the merge queue Feb 11, 2026
Merged via the queue into generative-computing:main with commit 8f9e18c Feb 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Update tests & examples to use granite4

2 participants