Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a docling service file conversion feature #455

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nerdalert
Copy link
Member

  • Integrates an automated file conversion via docling-serve to the native knowledge submission component.
  • Supports the docling conversion file types of PDF, DOCX, PPTX, XLSX, Images, HTML and AsciiDoc to be used in the next step of context selection (see demo vid below).
  • There is a health check of docling-serve at file upload selection, if the service is unavailable, only md files are accepted.

This depends on #439. That PRs changes are in this PR to spare myself another rebase. Will leave this in draft until 439 merges.

Demo .pptx conversion:

pptx-ui-docling-service-demo.mp4

Demo Image conversion:

image-ocr-ui-docling-service-demo.mp4

- Integrates automative file conversion via docling-serve to the
  native knowledge submission component.
- Supports the docling conversions PDF, DOCX, PPTX, XLSX, Images,
  HTML and AsciiDoc to be used in the next step of context selection.

Signed-off-by: Brent Salisbury <[email protected]>
@Misjohns
Copy link
Collaborator

@nerdalert
Does each file get converted into a MD file or do all the files get converted into a single large MD file? My concern is if a user uploads multiple files and we create a single MD for that, it would be more difficult to locate the context in that file. I believe it would be a better UX to create at MD for each uploaded file for the context select step. If needed, we could then merge the MDs together into a single file at submit.

@vishnoianil
Copy link
Member

@nerdalert this PR requires rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants