Skip to content

Image links not extracted correctly, resulting in empty ![]() in Markdown output #1177

Open
@KevinChen1994

Description

@KevinChen1994

Description

When using markitdown to convert a WeChat article to Markdown, the image URLs are not being extracted properly. The resulting Markdown output contains image tags like ![]() with empty URLs, causing the images to be missing from the final content.

Steps to Reproduce

from markitdown import MarkItDown
import requests

md = MarkItDown()

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36",
    "Accept": "application/json",
}

response = requests.get(
    "https://mp.weixin.qq.com/s/85a2235XkZPOevXW9HTXtg", headers=headers
)

result = md.convert(response)
print(result.text_content)

Actual Output

Star 从2月开始,加速增长:
![]()
微信指数,从2月开始,出现流量突增:
![]()

Expected Output

Image URLs should be correctly extracted and included in the Markdown output, e.g.:

![](https://mmbiz.qpic.cn/...)

Environment

OS: macOS

Python version: 3.12
markitdown version: latest

Additional Notes
It seems that the indicates the Markdown syntax is being applied, but the image URLs are missing. This may be caused by changes in the structure of WeChat article pages, or possibly a case that isn’t currently supported.

Would appreciate any help or fix for this — thanks for the great tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions