Skip to content

feat(confluence): support xlsx attachments#502

Open
lvlcn-t wants to merge 4 commits intolangchain-ai:mainfrom
lvlcn-t:main
Open

feat(confluence): support xlsx attachments#502
lvlcn-t wants to merge 4 commits intolangchain-ai:mainfrom
lvlcn-t:main

Conversation

@lvlcn-t
Copy link

@lvlcn-t lvlcn-t commented Jan 15, 2026

Description

Adds support for Confluence .xlsx attachments in ConfluenceLoader by dispatching on the .xlsx media type and extracting sheet/row text using openpyxl. Includes a minimal unit test covering the happy path and a failed-download path.

Fixes #413

Dependencies

  • Runtime: openpyxl is an optional dependency used only when processing .xlsx attachments.
  • Tests: the new unit tests are marked with @pytest.mark.requires("openpyxl").

Tests done

  • Ran make format locally
  • No new linting errors on make lint
  • New unit tests pass on make tests

Review focus

  • Attachment dispatch in ConfluenceLoader.process_attachment() for the .xlsx media type.
  • process_xlsx() output formatting (sheet header + tab-separated rows).

Relates to langchain-ai#413

Signed-off-by: lvlcn-t <75443136+lvlcn-t@users.noreply.github.com>
…ownload failures

Signed-off-by: lvlcn-t <75443136+lvlcn-t@users.noreply.github.com>
Signed-off-by: lvlcn-t <75443136+lvlcn-t@users.noreply.github.com>
Copilot AI review requested due to automatic review settings January 15, 2026 16:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for .xlsx (modern Excel) attachments in the ConfluenceLoader by implementing a new process_xlsx() method that uses the openpyxl library. Currently, only .xls (legacy Excel) files are supported.

Changes:

  • Added process_xlsx() method to handle .xlsx files using openpyxl
  • Updated process_attachment() to dispatch on the .xlsx media type
  • Added comprehensive unit tests for both successful and failed download scenarios
  • Cleaned up imports in test file (removed unused unittest import)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
libs/community/langchain_community/document_loaders/confluence.py Added process_xlsx() method and dispatching logic for .xlsx media type
libs/community/tests/unit_tests/document_loaders/test_confluence.py Added unit tests for xlsx attachment processing and cleaned up imports
Comments suppressed due to low confidence (2)

libs/community/langchain_community/document_loaders/confluence.py:47

  • The class docstring should be updated to explicitly mention that both .xls and .xlsx Excel formats are now supported. Consider changing "Excel" to "Excel (.xls and .xlsx)" to clarify that both legacy and modern Excel formats are supported.
    Document object. Currently supported attachment types are: PDF, PNG, JPEG/JPG,
    SVG, Word and Excel.

libs/community/tests/unit_tests/document_loaders/test_confluence.py:389

  • The helper method _create_xlsx_bytes() lacks a docstring explaining its purpose. Following the coding guidelines, this should include a brief description of what it does and what it returns, especially since it's a test utility that could be reused.
    def _create_xlsx_bytes(self) -> bytes:

Signed-off-by: lvlcn-t <75443136+lvlcn-t@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for .xlsx files in confluence loader

1 participant