Skip to content

fix vl mem leak#1645

Open
hhaAndroid wants to merge 3 commits intoInternLM:mainfrom
hhaAndroid:fix_vl_mem_leak
Open

fix vl mem leak#1645
hhaAndroid wants to merge 3 commits intoInternLM:mainfrom
hhaAndroid:fix_vl_mem_leak

Conversation

@hhaAndroid
Copy link
Copy Markdown
Collaborator

@hhaAndroid hhaAndroid commented Mar 31, 2026

This PR fixes memory leaks in VL (vision-language) data loading by:

  • Using context managers for Image.open() and io.BytesIO() to ensure file handles are closed promptly
  • Adding a trim_memory() helper that calls malloc_trim(0) to return freed heap pages to the OS after processing multimodal/video items

@hhaAndroid
Copy link
Copy Markdown
Collaborator Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 31, 2026

Claude: ## Summary

This PR fixes memory leaks in VL (vision-language) data loading by:

  1. Using context managers for Image.open() and io.BytesIO() to ensure file handles are closed promptly
  2. Adding a trim_memory() helper that calls malloc_trim(0) to return freed heap pages to the OS after processing multimodal/video items

The resource cleanup changes (context managers) are correct and good practice. A few concerns:

Issues

Warning

  • Incorrect type hint on trim_memory: The logger parameter is typed as logging.Logger but get_logger() returns a loguru.Logger. This will fail strict mypy checks.
  • Inverted dependency: trim_memory is a general utility but lives in qwen3_vl_utils.py. The base class module (base_mllm_tokenize_fn.py) should not import from a model-specific utils module. Consider moving it to a shared utils location.

Nit

  • Per-item malloc_trim overhead: trim_memory is called on every multimodal/video item. Worth benchmarking whether this syscall overhead matters at scale, or whether periodic trimming would be more efficient.
  • PR body is empty — a brief description of the root cause and fix would help reviewers.

Verdict

COMMENT — the core fix (context managers) is solid. The trim_memory utility needs the type hint fix and should ideally live in a shared module.

_TRIM_MEMORY_WARNED = False


def trim_memory(logger: logging.Logger | None = None):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger in function arg seems weird. looks like we want to trim memory induced by logger per se. under what circumstances do we need a logger other than the global loguru logger returned by xtuner.v1.utils.get_logger?

data_name: str | None = None,
llm_pack_weight: float = 1.0,
visual_pack_weight: float = 0.0,
trim_memory_step: int = 1,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better if we stick to trainer naming convention -- use ..._step as counter, and ..._interval for specifying frequency of doing things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants