Skip to content

feat: add RTF (Rich Text Format) converter#2151

Open
hoangsonww wants to merge 1 commit into
microsoft:mainfrom
hoangsonww:feat/rtf-converter
Open

feat: add RTF (Rich Text Format) converter#2151
hoangsonww wants to merge 1 commit into
microsoft:mainfrom
hoangsonww:feat/rtf-converter

Conversation

@hoangsonww

Copy link
Copy Markdown

Summary

Adds a converter for RTF (Rich Text Format) files, which MarkItDown did not previously support. RTF is a common interchange format (WordPad, TextEdit, legacy office exports), so this fills a real gap in the supported file types.

RtfConverter accepts .rtf files (the file extension and the application/rtf, application/x-rtf, text/rtf, text/richtext mimetypes), strips RTF control words, and returns the underlying text content as Markdown.

Implementation

  • New converters/_rtf_converter.py: RtfConverter(DocumentConverter) following the same accepts() / convert() shape as the existing converters. It uses the lightweight, pure-Python striprtf library, imported lazily and gated behind the standard MissingDependencyException (matching the docx/pptx/pdf pattern), so the base install is unaffected.
  • Registered in _markitdown.py and exported from converters/__init__.py.
  • Added an rtf = ["striprtf"] optional-dependency extra in pyproject.toml and folded striprtf into [all].
  • Charset handling mirrors the CSV converter (stream_info.charset with a charset-normalizer fallback).

Scope

This is a v1 text-extraction converter: it preserves the document's textual content with control words removed (the same altitude as the plain-text and CSV converters). It does not attempt to map RTF bold/heading runs onto Markdown syntax, which would require a substantially heavier dependency.

Testing

  • Added tests/test_files/test.rtf (a small fixture with heading, bold, italic, and paragraph content) and tests/test_rtf_converter.py (local-path and binary-stream conversion).
  • New RTF tests pass, and the full existing offline suite passes with [all] installed (no regressions).
  • python -m markitdown test.rtf produces the expected Markdown.
  • ruff check and ruff format are clean.

@hoangsonww

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants