convert_img, convert_a, and convert_video can emit Markdown that downstream parsers do not read as the original HTML. The failure mode is that markdownify drops raw values into Markdown link/image syntax without escaping them for that context. For img and video that means attribute-backed values like alt, src, and poster; for a and video it also includes generated label text inside [...].
Confirmed on 1.2.2 and current develop (at the time of filing, markdownify/__init__.py was byte-identical on both).
Reproducer
from markdownify import markdownify as md
md('<img src="/a" alt="]">')
# Output: '![]](/a)'
# Expected: '![\\]](/a)' (image preserved)
# Re-parse: renders as literal text, image destroyed
md('<img src="/a b" alt="x">')
# Output: ''
# Expected: '' or URL-encoded
# Re-parse: literal text in 3 of 4 parsers
md('<img src="/safe" alt="](http://attacker)">')
# Output: '](/safe)'
# Re-parse: <img src="http://attacker" alt=""/>](/safe)
# attacker-controlled URL substituted, original destination left as trailing literal text
md('<a href="/a)b">click</a>')
# Output: '[click](/a)b)'
# Re-parse: <a href="/a">click</a>b)
I re-ran those outputs through Python-Markdown, Mistune, commonmark.py, and markdown-it-py. The delimiter-truncation and URL-substitution cases break in all four. The space-in-destination cases are accepted by Python-Markdown but rendered as literal text by the three CommonMark parsers.
That matches CommonMark §6.3 (links) and §6.4 (images): brackets in labels need to be escaped or balanced, raw destinations cannot contain spaces unless they are written as <...>, and an unescaped ) closes an unbalanced destination early.
escape_misc=True is not a full workaround. It does not help for attribute-backed fields such as img alt/src/title, href, src, or poster, because those values bypass escape(). It does help when the broken piece is generated label text. For example, <a href="link">text]</a> becomes [text\]](link) with escape_misc=True.
convert_img is the clearest example: it pulls attributes directly from el.attrs and returns  without routing alt or src through escape().
Failing input patterns
The confirmed input shapes so far are unbalanced [ or ] in alt, ) or a space in src or href, and ](...) appearing in alt or link text. The last case is the URL-substitution variant: the embedded URL becomes the parsed destination and the original src/href is left behind as trailing literal text.
Security note
The ](http://...) case is the one I would call out separately because it can substitute an attacker-controlled URL into the parsed Markdown output. That seems relevant for any pipeline that treats markdownify output as a trusted source of destinations, including HTML-to-Markdown storage flows or LLM ingest pipelines. I am not filing this as a CVE; I just want the behavior on record.
I have not tested sanitizer behavior here, so I am not making a stronger mitigation claim in this issue body.
Affected functions
Affected code paths include convert_img for raw src, alt, and title; convert_a for raw href plus the surrounding [...] around generated link text; and convert_video for raw src, poster, fallback <source src>, and generated label text. The existing title.replace('"', r'\"') in convert_img is a partial version of the kind of context-aware escaping that is needed here.
Fix shape
If you want a PR, my preference would be a shared escape layer for Markdown labels, destinations, and titles, applied anywhere markdownify emits link/image syntax. A narrower delimiter-by-delimiter patch would fix the immediate repros, but it would keep the escaping rules fragmented across emitters and make this class of bug easy to reintroduce.
convert_img,convert_a, andconvert_videocan emit Markdown that downstream parsers do not read as the original HTML. The failure mode is that markdownify drops raw values into Markdown link/image syntax without escaping them for that context. Forimgandvideothat means attribute-backed values likealt,src, andposter; foraandvideoit also includes generated label text inside[...].Confirmed on
1.2.2and currentdevelop(at the time of filing,markdownify/__init__.pywas byte-identical on both).Reproducer
I re-ran those outputs through Python-Markdown, Mistune, commonmark.py, and markdown-it-py. The delimiter-truncation and URL-substitution cases break in all four. The space-in-destination cases are accepted by Python-Markdown but rendered as literal text by the three CommonMark parsers.
That matches CommonMark §6.3 (links) and §6.4 (images): brackets in labels need to be escaped or balanced, raw destinations cannot contain spaces unless they are written as
<...>, and an unescaped)closes an unbalanced destination early.escape_misc=Trueis not a full workaround. It does not help for attribute-backed fields such asimg alt/src/title,href,src, orposter, because those values bypassescape(). It does help when the broken piece is generated label text. For example,<a href="link">text]</a>becomes[text\]](link)withescape_misc=True.convert_imgis the clearest example: it pulls attributes directly fromel.attrsand returnswithout routingaltorsrcthroughescape().Failing input patterns
The confirmed input shapes so far are unbalanced
[or]inalt,)or a space insrcorhref, and](...)appearing inaltor link text. The last case is the URL-substitution variant: the embedded URL becomes the parsed destination and the originalsrc/hrefis left behind as trailing literal text.Security note
The
](http://...)case is the one I would call out separately because it can substitute an attacker-controlled URL into the parsed Markdown output. That seems relevant for any pipeline that treats markdownify output as a trusted source of destinations, including HTML-to-Markdown storage flows or LLM ingest pipelines. I am not filing this as a CVE; I just want the behavior on record.I have not tested sanitizer behavior here, so I am not making a stronger mitigation claim in this issue body.
Affected functions
Affected code paths include
convert_imgfor rawsrc,alt, andtitle;convert_afor rawhrefplus the surrounding[...]around generated link text; andconvert_videofor rawsrc,poster, fallback<source src>, and generated label text. The existingtitle.replace('"', r'\"')inconvert_imgis a partial version of the kind of context-aware escaping that is needed here.Fix shape
If you want a PR, my preference would be a shared escape layer for Markdown labels, destinations, and titles, applied anywhere markdownify emits link/image syntax. A narrower delimiter-by-delimiter patch would fix the immediate repros, but it would keep the escaping rules fragmented across emitters and make this class of bug easy to reintroduce.