Skip to content

Move file processing from UI to DocSum backend service #1899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 8, 2025
24 changes: 23 additions & 1 deletion DocSum/docker_compose/amd/gpu/rocm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,13 +239,16 @@ curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
-F "language=en" \
```

Note that the `-F "messages="` flag is required, even for file uploads. Multiple files can be uploaded in a single call with multiple `-F "files=@/path"` inputs.

### Query with audio and video

> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
> Audio and video can be passed as base64 strings or uploaded by providing a local file path.

Audio:

```bash
# Send base64 string
curl -X POST http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
-H "Content-Type: application/json" \
-d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
Expand All @@ -257,11 +260,21 @@ curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
-F "max_tokens=32" \
-F "language=en" \
-F "stream=True"

# Upload file
curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
-H "Content-Type: multipart/form-data" \
-F "type=audio" \
-F "messages=" \
-F "files=@/path to your file (.mp3, .wav)" \
-F "max_tokens=32" \
-F "language=en"
```

Video:

```bash
# Send base64 string
curl -X POST http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
-H "Content-Type: application/json" \
-d '{"type": "video", "messages": "convert your video to base64 data type"}'
Expand All @@ -273,6 +286,15 @@ curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
-F "max_tokens=32" \
-F "language=en" \
-F "stream=True"

# Upload file
curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
-H "Content-Type: multipart/form-data" \
-F "type=video" \
-F "messages=" \
-F "files=@/path to your file (.mp4)" \
-F "max_tokens=32" \
-F "language=en"
```

### Query with long context
Expand Down
26 changes: 24 additions & 2 deletions DocSum/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,16 +156,19 @@ curl http://${host_ip}:8888/v1/docsum \
-F "messages=" \
-F "files=@/path to your file (.txt, .docx, .pdf)" \
-F "max_tokens=32" \
-F "language=en" \
-F "language=en"
```

Note that the `-F "messages="` flag is required, even for file uploads. Multiple files can be uploaded in a single call with multiple `-F "files=@/path"` inputs.

### Query with audio and video

> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
> Audio and video can be passed as base64 strings or uploaded by providing a local file path.

Audio:

```bash
# Send base64 string
curl -X POST http://${host_ip}:8888/v1/docsum \
-H "Content-Type: application/json" \
-d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
Expand All @@ -177,11 +180,21 @@ curl http://${host_ip}:8888/v1/docsum \
-F "max_tokens=32" \
-F "language=en" \
-F "stream=True"

# Upload file
curl http://${host_ip}:8888/v1/docsum \
-H "Content-Type: multipart/form-data" \
-F "type=audio" \
-F "messages=" \
-F "files=@/path to your file (.mp3, .wav)" \
-F "max_tokens=32" \
-F "language=en"
```

Video:

```bash
# Send base64 string
curl -X POST http://${host_ip}:8888/v1/docsum \
-H "Content-Type: application/json" \
-d '{"type": "video", "messages": "convert your video to base64 data type"}'
Expand All @@ -193,6 +206,15 @@ curl http://${host_ip}:8888/v1/docsum \
-F "max_tokens=32" \
-F "language=en" \
-F "stream=True"

# Upload file
curl http://${host_ip}:8888/v1/docsum \
-H "Content-Type: multipart/form-data" \
-F "type=video" \
-F "messages=" \
-F "files=@/path to your file (.mp4)" \
-F "max_tokens=32" \
-F "language=en"
```

### Query with long context
Expand Down
24 changes: 23 additions & 1 deletion DocSum/docker_compose/intel/hpu/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,13 +161,16 @@ curl http://${host_ip}:8888/v1/docsum \
-F "language=en" \
```

Note that the `-F "messages="` flag is required, even for file uploads. Multiple files can be uploaded in a single call with multiple `-F "files=@/path"` inputs.

### Query with audio and video

> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
> Audio and video can be passed as base64 strings or uploaded by providing a local file path.

Audio:

```bash
# Send base64 string
curl -X POST http://${host_ip}:8888/v1/docsum \
-H "Content-Type: application/json" \
-d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
Expand All @@ -179,11 +182,21 @@ curl http://${host_ip}:8888/v1/docsum \
-F "max_tokens=32" \
-F "language=en" \
-F "stream=True"

# Upload file
curl http://${host_ip}:8888/v1/docsum \
-H "Content-Type: multipart/form-data" \
-F "type=audio" \
-F "messages=" \
-F "files=@/path to your file (.mp3, .wav)" \
-F "max_tokens=32" \
-F "language=en"
```

Video:

```bash
# Send base64 string
curl -X POST http://${host_ip}:8888/v1/docsum \
-H "Content-Type: application/json" \
-d '{"type": "video", "messages": "convert your video to base64 data type"}'
Expand All @@ -195,6 +208,15 @@ curl http://${host_ip}:8888/v1/docsum \
-F "max_tokens=32" \
-F "language=en" \
-F "stream=True"

# Upload file
curl http://${host_ip}:8888/v1/docsum \
-H "Content-Type: multipart/form-data" \
-F "type=video" \
-F "messages=" \
-F "files=@/path to your file (.mp4)" \
-F "max_tokens=32" \
-F "language=en"
```

### Query with long context
Expand Down
44 changes: 28 additions & 16 deletions DocSum/docsum.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,20 @@ def read_pdf(file):
return docs


def encode_file_to_base64(file_path):
"""Encode the content of a file to a base64 string.

Args:
file_path (str): The path to the file to be encoded.

Returns:
str: The base64 encoded string of the file content.
"""
with open(file_path, "rb") as f:
base64_str = base64.b64encode(f.read()).decode("utf-8")
return base64_str


def video2audio(
video_base64: str,
) -> str:
Expand Down Expand Up @@ -163,7 +177,6 @@ def add_remote_service(self):

async def handle_request(self, request: Request, files: List[UploadFile] = File(default=None)):
"""Accept pure text, or files .txt/.pdf.docx, audio/video base64 string."""

if "application/json" in request.headers.get("content-type"):
data = await request.json()
stream_opt = data.get("stream", True)
Expand Down Expand Up @@ -193,25 +206,24 @@ async def handle_request(self, request: Request, files: List[UploadFile] = File(
uid = str(uuid.uuid4())
file_path = f"/tmp/{uid}"

if data_type is not None and data_type in ["audio", "video"]:
raise ValueError(
"Audio and Video file uploads are not supported in docsum with curl request, \
please use the UI or pass base64 string of the content directly."
)

else:
import aiofiles
import aiofiles

async with aiofiles.open(file_path, "wb") as f:
await f.write(await file.read())
async with aiofiles.open(file_path, "wb") as f:
await f.write(await file.read())

if data_type == "text":
docs = read_text_from_file(file, file_path)
os.remove(file_path)
elif data_type in ["audio", "video"]:
docs = encode_file_to_base64(file_path)
else:
raise ValueError(f"Data type not recognized: {data_type}")

os.remove(file_path)

if isinstance(docs, list):
file_summaries.extend(docs)
else:
file_summaries.append(docs)
if isinstance(docs, list):
file_summaries.extend(docs)
else:
file_summaries.append(docs)

if file_summaries:
prompt = handle_message(chat_request.messages) + "\n".join(file_summaries)
Expand Down
28 changes: 28 additions & 0 deletions DocSum/tests/test_compose_on_gaudi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,20 @@ function validate_megaservice_multimedia() {
"language=en" \
"stream=False"

echo ">>> Checking audio data in form format, upload file"
validate_service \
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
"well" \
"docsum-gaudi-backend-server" \
"docsum-gaudi-backend-server" \
"media" "" \
"type=audio" \
"messages=" \
"files=@$ROOT_FOLDER/data/test.wav" \
"max_tokens=32" \
"language=en" \
"stream=False"

echo ">>> Checking video data in json format"
validate_service \
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
Expand All @@ -258,6 +272,20 @@ function validate_megaservice_multimedia() {
"max_tokens=32" \
"language=en" \
"stream=False"

echo ">>> Checking video data in form format, upload file"
validate_service \
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
"bye" \
"docsum-gaudi-backend-server" \
"docsum-gaudi-backend-server" \
"media" "" \
"type=video" \
"messages=" \
"files=@$ROOT_FOLDER/data/test.mp4" \
"max_tokens=32" \
"language=en" \
"stream=False"
}

function validate_megaservice_long_text() {
Expand Down
28 changes: 28 additions & 0 deletions DocSum/tests/test_compose_on_xeon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,20 @@ function validate_megaservice_multimedia() {
"language=en" \
"stream=False"

echo ">>> Checking audio data in form format, upload file"
validate_service \
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
"well" \
"docsum-xeon-backend-server" \
"docsum-xeon-backend-server" \
"media" "" \
"type=audio" \
"messages=" \
"files=@$ROOT_FOLDER/data/test.wav" \
"max_tokens=32" \
"language=en" \
"stream=False"

echo ">>> Checking video data in json format"
validate_service \
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
Expand All @@ -258,6 +272,20 @@ function validate_megaservice_multimedia() {
"max_tokens=32" \
"language=en" \
"stream=False"

echo ">>> Checking video data in form format, upload file"
validate_service \
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
"bye" \
"docsum-xeon-backend-server" \
"docsum-xeon-backend-server" \
"media" "" \
"type=video" \
"messages=" \
"files=@$ROOT_FOLDER/data/test.mp4" \
"max_tokens=32" \
"language=en" \
"stream=False"
}

function validate_megaservice_long_text() {
Expand Down
28 changes: 28 additions & 0 deletions DocSum/tests/test_compose_tgi_on_gaudi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,20 @@ function validate_megaservice_multimedia() {
"language=en" \
"stream=False"

echo ">>> Checking audio data in form format, upload file"
validate_service \
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
"well" \
"docsum-gaudi-backend-server" \
"docsum-gaudi-backend-server" \
"media" "" \
"type=audio" \
"messages=" \
"files=@$ROOT_FOLDER/data/test.wav" \
"max_tokens=32" \
"language=en" \
"stream=False"

echo ">>> Checking video data in json format"
validate_service \
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
Expand All @@ -250,6 +264,20 @@ function validate_megaservice_multimedia() {
"max_tokens=32" \
"language=en" \
"stream=False"

echo ">>> Checking video data in form format, upload file"
validate_service \
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
"bye" \
"docsum-gaudi-backend-server" \
"docsum-gaudi-backend-server" \
"media" "" \
"type=video" \
"messages=" \
"files=@$ROOT_FOLDER/data/test.mp4" \
"max_tokens=32" \
"language=en" \
"stream=False"
}

function validate_megaservice_long_text() {
Expand Down
Loading