microsoft
diff --git a/‎README.md‎
Lines changed: 33 additions & 1 deletion b/‎README.md‎
Lines changed: 33 additions & 1 deletion
diff --git a/‎packages/markitdown-mcp/README.md‎
Lines changed: 8 additions & 4 deletions b/‎packages/markitdown-mcp/README.md‎
Lines changed: 8 additions & 4 deletions
diff --git a/‎packages/markitdown-mcp/src/markitdown_mcp/__about__.py‎
Lines changed: 1 addition & 1 deletion b/‎packages/markitdown-mcp/src/markitdown_mcp/__about__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎packages/markitdown-mcp/src/markitdown_mcp/__main__.py‎
Lines changed: 14 additions & 1 deletion b/‎packages/markitdown-mcp/src/markitdown_mcp/__main__.py‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎packages/markitdown-ocr/LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎packages/markitdown-ocr/LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎packages/markitdown-ocr/README.md‎
Lines changed: 200 additions & 0 deletions b/‎packages/markitdown-ocr/README.md‎
Lines changed: 200 additions & 0 deletions
@@ -9,7 +9,7 @@
 
 > [!IMPORTANT]
 > Breaking changes between 0.0.1 to 0.1.0:
-> * Dependencies are now organized into optional feature-groups (further details below). Use `pip install 'markitdown[all]'` to have backward-compatible behavior. 
+> * Dependencies are now organized into optional feature-groups (further details below). Use `pip install 'markitdown[all]'` to have backward-compatible behavior.
 > * convert\_stream() now requires a binary file-like object (e.g., a file opened in binary mode, or an io.BytesIO object). This is a breaking change from the previous version, where it previously also accepted text file-like objects, like io.StringIO.
 > * The DocumentConverter class interface has changed to read from file-like streams rather than file paths. *No temporary files are created anymore*. If you are the maintainer of a plugin, or custom DocumentConverter, you likely need to update your code. Otherwise, if only using the MarkItDown class or CLI (as in these examples), you should not need to change anything.
 
@@ -132,6 +132,38 @@ markitdown --use-plugins path-to-file.pdf
 
 To find available plugins, search GitHub for the hashtag `#markitdown-plugin`. To develop a plugin, see `packages/markitdown-sample-plugin`.
 
+#### markitdown-ocr Plugin
+
+The `markitdown-ocr` plugin adds OCR support to PDF, DOCX, PPTX, and XLSX converters, extracting text from embedded images using LLM Vision — the same `llm_client` / `llm_model` pattern that MarkItDown already uses for image descriptions. No new ML libraries or binary dependencies required.
+
+**Installation:**
+
+```bash
+pip install markitdown-ocr
+pip install openai  # or any OpenAI-compatible client
+```
+
+**Usage:**
+
+Pass the same `llm_client` and `llm_model` you would use for image descriptions:
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+)
+result = md.convert("document_with_images.pdf")
+print(result.text_content)
+```
+
+If no `llm_client` is provided the plugin still loads, but OCR is silently skipped and the standard built-in converter is used instead.
+
+See [`packages/markitdown-ocr/README.md`](packages/markitdown-ocr/README.md) for detailed documentation.
+
 ### Azure Document Intelligence
 
 To use Microsoft Document Intelligence for conversion:
 
@@ -1,5 +1,9 @@
 # MarkItDown-MCP
 
+> [!IMPORTANT]
+> The MarkItDown-MCP package is meant for **local use**, with local trusted agents. In particular, when running the MCP server with Streamable HTTP or SSE, it binds to `localhost` by default, and is not exposed to other machines on the network or Internet. In this configuration, it is meant to be a direct alternative to the STDIO transport, which may be more convenient in some cases. DO NOT bind the server to other interfaces unless you understand the [security implications](#security-considerations) of doing so.
+
+
 [![PyPI](https://img.shields.io/pypi/v/markitdown-mcp.svg)](https://pypi.org/project/markitdown-mcp/)
 ![PyPI - Downloads](https://img.shields.io/pypi/dd/markitdown-mcp)
 [![Built by AutoGen Team](https://img.shields.io/badge/Built%20by-AutoGen%20Team-blue)](https://github.com/microsoft/autogen)
@@ -18,14 +22,14 @@ pip install markitdown-mcp
 
 ## Usage
 
-To run the MCP server, using STDIO (default) use the following command:
+To run the MCP server, using STDIO (default), use the following command:
 
 
 ```bash	
 markitdown-mcp
 ```
 
-To run the MCP server, using Streamable HTTP and SSE use the following command:
+To run the MCP server, using Streamable HTTP and SSE, use the following command:
 
 ```bash	
 markitdown-mcp --http --host 127.0.0.1 --port 3001
@@ -96,7 +100,7 @@ If you want to mount a directory, adjust it accordingly:
 
 ## Debugging
 
-To debug the MCP server you can use the `mcpinspector` tool.
+To debug the MCP server you can use the `MCP Inspector` tool.
 
 ```bash
 npx @modelcontextprotocol/inspector
@@ -127,7 +131,7 @@ Finally:
 
 ## Security Considerations
 
-The server does not support authentication, and runs with the privileges of the user running it. For this reason, when running in SSE or Streamable HTTP mode, it is recommended to run the server bound to `localhost` (default).
+The server does not support authentication, and runs with the privileges of the user running it. For this reason, when running in SSE or Streamable HTTP mode, the server binds by default to `localhost`. Even still, it is important to recognize that the server can be accessed by any process or users on the same local machine, and that the `convert_to_markdown` tool can be used to read any file that the server's user has access to, or any data from the network. If you require additional security, consider running the server in a sandboxed environment, such as a virtual machine or container, and ensure that the user permissions are properly configured to limit access to sensitive files and network segments. Above all, DO NOT bind the server to other interfaces (non-localhost) unless you understand the security implications of doing so.
 
 ## Trademarks
 
 
@@ -1,4 +1,4 @@
 # SPDX-FileCopyrightText: 2024-present Adam Fourney <adamfo@microsoft.com>
 #
 # SPDX-License-Identifier: MIT
-__version__ = "0.0.1a4"
+__version__ = "0.0.1a5"
@@ -113,10 +113,23 @@ def main():
         sys.exit(1)
 
     if use_http:
+        host = args.host if args.host else "127.0.0.1"
+        if args.host and args.host not in ("127.0.0.1", "localhost"):
+            print(
+                "\n"
+                "WARNING: The server is being bound to a non-localhost interface "
+                f"({host}).\n"
+                "This exposes the server to other machines on the network or Internet.\n"
+                "The server has NO authentication and runs with your user's privileges.\n"
+                "Any process or user that can reach this interface can read files and\n"
+                "fetch network resources accessible to this user.\n"
+                "Only proceed if you understand the security implications.\n",
+                file=sys.stderr,
+            )
         starlette_app = create_starlette_app(mcp_server, debug=True)
         uvicorn.run(
             starlette_app,
-            host=args.host if args.host else "127.0.0.1",
+            host=host,
             port=args.port if args.port else 3001,
         )
     else:
 
@@ -0,0 +1,21 @@
+    MIT License
+
+    Copyright (c) Microsoft Corporation.
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE
@@ -0,0 +1,200 @@
+# MarkItDown OCR Plugin
+
+LLM Vision plugin for MarkItDown that extracts text from images embedded in PDF, DOCX, PPTX, and XLSX files.
+
+Uses the same `llm_client` / `llm_model` pattern that MarkItDown already supports for image descriptions — no new ML libraries or binary dependencies required.
+
+## Features
+
+- **Enhanced PDF Converter**: Extracts text from images within PDFs, with full-page OCR fallback for scanned documents
+- **Enhanced DOCX Converter**: OCR for images in Word documents
+- **Enhanced PPTX Converter**: OCR for images in PowerPoint presentations
+- **Enhanced XLSX Converter**: OCR for images in Excel spreadsheets
+- **Context Preservation**: Maintains document structure and flow when inserting extracted text
+
+## Installation
+
+```bash
+pip install markitdown-ocr
+```
+
+The plugin uses whatever OpenAI-compatible client you already have. Install one if you don't have it yet:
+
+```bash
+pip install openai
+```
+
+## Usage
+
+### Command Line
+
+```bash
+markitdown document.pdf --use-plugins --llm-client openai --llm-model gpt-4o
+```
+
+### Python API
+
+Pass `llm_client` and `llm_model` to `MarkItDown()` exactly as you would for image descriptions:
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+)
+
+result = md.convert("document_with_images.pdf")
+print(result.text_content)
+```
+
+If no `llm_client` is provided the plugin still loads, but OCR is silently skipped — falling back to the standard built-in converter.
+
+### Custom Prompt
+
+Override the default extraction prompt for specialized documents:
+
+```python
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+    llm_prompt="Extract all text from this image, preserving table structure.",
+)
+```
+
+### Any OpenAI-Compatible Client
+
+Works with any client that follows the OpenAI API:
+
+```python
+from openai import AzureOpenAI
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=AzureOpenAI(
+        api_key="...",
+        azure_endpoint="https://your-resource.openai.azure.com/",
+        api_version="2024-02-01",
+    ),
+    llm_model="gpt-4o",
+)
+```
+
+## How It Works
+
+When `MarkItDown(enable_plugins=True, llm_client=..., llm_model=...)` is called:
+
+1. MarkItDown discovers the plugin via the `markitdown.plugin` entry point group
+2. It calls `register_converters()`, forwarding all kwargs including `llm_client` and `llm_model`
+3. The plugin creates an `LLMVisionOCRService` from those kwargs
+4. Four OCR-enhanced converters are registered at **priority -1.0** — before the built-in converters at priority 0.0
+
+When a file is converted:
+
+1. The OCR converter accepts the file
+2. It extracts embedded images from the document
+3. Each image is sent to the LLM with an extraction prompt
+4. The returned text is inserted inline, preserving document structure
+5. If the LLM call fails, conversion continues without that image's text
+
+## Supported File Formats
+
+### PDF
+
+- Embedded images are extracted by position (via `page.images` / page XObjects) and OCR'd inline, interleaved with the surrounding text in vertical reading order.
+- **Scanned PDFs** (pages with no extractable text) are detected automatically: each page is rendered at 300 DPI and sent to the LLM as a full-page image.
+- **Malformed PDFs** that pdfplumber/pdfminer cannot open (e.g. truncated EOF) are retried with PyMuPDF page rendering, so content is still recovered.
+
+### DOCX
+
+- Images are extracted via document part relationships (`doc.part.rels`).
+- OCR is run before the DOCX→HTML→Markdown pipeline executes: placeholder tokens are injected into the HTML so that the markdown converter does not escape the OCR markers, and the final placeholders are replaced with the formatted `*[Image OCR]...[End OCR]*` blocks after conversion.
+- Document flow (headings, paragraphs, tables) is fully preserved around the OCR blocks.
+
+### PPTX
+
+- Picture shapes, placeholder shapes with images, and images inside groups are all supported.
+- Shapes are processed in top-to-left reading order per slide.
+- If an `llm_client` is configured, the LLM is asked for a description first; OCR is used as the fallback when no description is returned.
+
+### XLSX
+
+- Images embedded in worksheets (`sheet._images`) are extracted per sheet.
+- Cell position is calculated from the image anchor coordinates (column/row → Excel letter notation).
+- Images are listed under a `### Images in this sheet:` section after the sheet's data table — they are not interleaved into the table rows.
+
+### Output format
+
+Every extracted OCR block is wrapped as:
+
+```text
+*[Image OCR]
+<extracted text>
+[End OCR]*
+```
+
+## Troubleshooting
+
+### OCR text missing from output
+
+The most likely cause is a missing `llm_client` or `llm_model`. Verify:
+
+```python
+from openai import OpenAI
+from markitdown import MarkItDown
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),   # required
+    llm_model="gpt-4o",    # required
+)
+```
+
+### Plugin not loading
+
+Confirm the plugin is installed and discovered:
+
+```bash
+markitdown --list-plugins   # should show: ocr
+```
+
+### API errors
+
+The plugin propagates LLM API errors as warnings and continues conversion. Check your API key, quota, and that the chosen model supports vision inputs.
+
+## Development
+
+### Running Tests
+
+```bash
+cd packages/markitdown-ocr
+pytest tests/ -v
+```
+
+### Building from Source
+
+```bash
+git clone https://github.com/microsoft/markitdown.git
+cd markitdown/packages/markitdown-ocr
+pip install -e .
+```
+
+## Contributing
+
+Contributions are welcome! See the [MarkItDown repository](https://github.com/microsoft/markitdown) for guidelines.
+
+## License
+
+MIT — see [LICENSE](LICENSE).
+
+## Changelog
+
+### 0.1.0 (Initial Release)
+
+- LLM Vision OCR for PDF, DOCX, PPTX, XLSX
+- Full-page OCR fallback for scanned PDFs
+- Context-aware inline text insertion
+- Priority-based converter replacement (no code changes required)
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`	`1`	`# SPDX-FileCopyrightText: 2024-present Adam Fourney <adamfo@microsoft.com>`
`2`	`2`	`#`
`3`	`3`	`# SPDX-License-Identifier: MIT`
`4`		`-__version__ = "0.0.1a4"`
	`4`	`+__version__ = "0.0.1a5"`