Skip to content

Commit 0cfb145

Browse files
mhbuehleryongfengdu
authored andcommitted
Move file processing from UI to DocSum backend service (opea-project#1899)
Signed-off-by: Melanie Buehler <[email protected]>
1 parent 35e8b13 commit 0cfb145

File tree

9 files changed

+301
-149
lines changed

9 files changed

+301
-149
lines changed

DocSum/docker_compose/amd/gpu/rocm/README.md

+23-1
Original file line numberDiff line numberDiff line change
@@ -239,13 +239,16 @@ curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
239239
-F "language=en" \
240240
```
241241

242+
Note that the `-F "messages="` flag is required, even for file uploads. Multiple files can be uploaded in a single call with multiple `-F "files=@/path"` inputs.
243+
242244
### Query with audio and video
243245

244-
> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
246+
> Audio and video can be passed as base64 strings or uploaded by providing a local file path.
245247

246248
Audio:
247249

248250
```bash
251+
# Send base64 string
249252
curl -X POST http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
250253
-H "Content-Type: application/json" \
251254
-d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
@@ -257,11 +260,21 @@ curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
257260
-F "max_tokens=32" \
258261
-F "language=en" \
259262
-F "stream=True"
263+
264+
# Upload file
265+
curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
266+
-H "Content-Type: multipart/form-data" \
267+
-F "type=audio" \
268+
-F "messages=" \
269+
-F "files=@/path to your file (.mp3, .wav)" \
270+
-F "max_tokens=32" \
271+
-F "language=en"
260272
```
261273

262274
Video:
263275

264276
```bash
277+
# Send base64 string
265278
curl -X POST http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
266279
-H "Content-Type: application/json" \
267280
-d '{"type": "video", "messages": "convert your video to base64 data type"}'
@@ -273,6 +286,15 @@ curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
273286
-F "max_tokens=32" \
274287
-F "language=en" \
275288
-F "stream=True"
289+
290+
# Upload file
291+
curl http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum \
292+
-H "Content-Type: multipart/form-data" \
293+
-F "type=video" \
294+
-F "messages=" \
295+
-F "files=@/path to your file (.mp4)" \
296+
-F "max_tokens=32" \
297+
-F "language=en"
276298
```
277299

278300
### Query with long context

DocSum/docker_compose/intel/cpu/xeon/README.md

+24-2
Original file line numberDiff line numberDiff line change
@@ -156,16 +156,19 @@ curl http://${host_ip}:8888/v1/docsum \
156156
-F "messages=" \
157157
-F "files=@/path to your file (.txt, .docx, .pdf)" \
158158
-F "max_tokens=32" \
159-
-F "language=en" \
159+
-F "language=en"
160160
```
161161

162+
Note that the `-F "messages="` flag is required, even for file uploads. Multiple files can be uploaded in a single call with multiple `-F "files=@/path"` inputs.
163+
162164
### Query with audio and video
163165

164-
> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
166+
> Audio and video can be passed as base64 strings or uploaded by providing a local file path.
165167
166168
Audio:
167169

168170
```bash
171+
# Send base64 string
169172
curl -X POST http://${host_ip}:8888/v1/docsum \
170173
-H "Content-Type: application/json" \
171174
-d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
@@ -177,11 +180,21 @@ curl http://${host_ip}:8888/v1/docsum \
177180
-F "max_tokens=32" \
178181
-F "language=en" \
179182
-F "stream=True"
183+
184+
# Upload file
185+
curl http://${host_ip}:8888/v1/docsum \
186+
-H "Content-Type: multipart/form-data" \
187+
-F "type=audio" \
188+
-F "messages=" \
189+
-F "files=@/path to your file (.mp3, .wav)" \
190+
-F "max_tokens=32" \
191+
-F "language=en"
180192
```
181193

182194
Video:
183195

184196
```bash
197+
# Send base64 string
185198
curl -X POST http://${host_ip}:8888/v1/docsum \
186199
-H "Content-Type: application/json" \
187200
-d '{"type": "video", "messages": "convert your video to base64 data type"}'
@@ -193,6 +206,15 @@ curl http://${host_ip}:8888/v1/docsum \
193206
-F "max_tokens=32" \
194207
-F "language=en" \
195208
-F "stream=True"
209+
210+
# Upload file
211+
curl http://${host_ip}:8888/v1/docsum \
212+
-H "Content-Type: multipart/form-data" \
213+
-F "type=video" \
214+
-F "messages=" \
215+
-F "files=@/path to your file (.mp4)" \
216+
-F "max_tokens=32" \
217+
-F "language=en"
196218
```
197219

198220
### Query with long context

DocSum/docker_compose/intel/hpu/gaudi/README.md

+23-1
Original file line numberDiff line numberDiff line change
@@ -161,13 +161,16 @@ curl http://${host_ip}:8888/v1/docsum \
161161
-F "language=en" \
162162
```
163163

164+
Note that the `-F "messages="` flag is required, even for file uploads. Multiple files can be uploaded in a single call with multiple `-F "files=@/path"` inputs.
165+
164166
### Query with audio and video
165167

166-
> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
168+
> Audio and video can be passed as base64 strings or uploaded by providing a local file path.
167169
168170
Audio:
169171

170172
```bash
173+
# Send base64 string
171174
curl -X POST http://${host_ip}:8888/v1/docsum \
172175
-H "Content-Type: application/json" \
173176
-d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
@@ -179,11 +182,21 @@ curl http://${host_ip}:8888/v1/docsum \
179182
-F "max_tokens=32" \
180183
-F "language=en" \
181184
-F "stream=True"
185+
186+
# Upload file
187+
curl http://${host_ip}:8888/v1/docsum \
188+
-H "Content-Type: multipart/form-data" \
189+
-F "type=audio" \
190+
-F "messages=" \
191+
-F "files=@/path to your file (.mp3, .wav)" \
192+
-F "max_tokens=32" \
193+
-F "language=en"
182194
```
183195

184196
Video:
185197

186198
```bash
199+
# Send base64 string
187200
curl -X POST http://${host_ip}:8888/v1/docsum \
188201
-H "Content-Type: application/json" \
189202
-d '{"type": "video", "messages": "convert your video to base64 data type"}'
@@ -195,6 +208,15 @@ curl http://${host_ip}:8888/v1/docsum \
195208
-F "max_tokens=32" \
196209
-F "language=en" \
197210
-F "stream=True"
211+
212+
# Upload file
213+
curl http://${host_ip}:8888/v1/docsum \
214+
-H "Content-Type: multipart/form-data" \
215+
-F "type=video" \
216+
-F "messages=" \
217+
-F "files=@/path to your file (.mp4)" \
218+
-F "max_tokens=32" \
219+
-F "language=en"
198220
```
199221

200222
### Query with long context

DocSum/docsum.py

+28-16
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,20 @@ def read_pdf(file):
6363
return docs
6464

6565

66+
def encode_file_to_base64(file_path):
67+
"""Encode the content of a file to a base64 string.
68+
69+
Args:
70+
file_path (str): The path to the file to be encoded.
71+
72+
Returns:
73+
str: The base64 encoded string of the file content.
74+
"""
75+
with open(file_path, "rb") as f:
76+
base64_str = base64.b64encode(f.read()).decode("utf-8")
77+
return base64_str
78+
79+
6680
def video2audio(
6781
video_base64: str,
6882
) -> str:
@@ -163,7 +177,6 @@ def add_remote_service(self):
163177

164178
async def handle_request(self, request: Request, files: List[UploadFile] = File(default=None)):
165179
"""Accept pure text, or files .txt/.pdf.docx, audio/video base64 string."""
166-
167180
if "application/json" in request.headers.get("content-type"):
168181
data = await request.json()
169182
stream_opt = data.get("stream", True)
@@ -193,25 +206,24 @@ async def handle_request(self, request: Request, files: List[UploadFile] = File(
193206
uid = str(uuid.uuid4())
194207
file_path = f"/tmp/{uid}"
195208

196-
if data_type is not None and data_type in ["audio", "video"]:
197-
raise ValueError(
198-
"Audio and Video file uploads are not supported in docsum with curl request, \
199-
please use the UI or pass base64 string of the content directly."
200-
)
201-
202-
else:
203-
import aiofiles
209+
import aiofiles
204210

205-
async with aiofiles.open(file_path, "wb") as f:
206-
await f.write(await file.read())
211+
async with aiofiles.open(file_path, "wb") as f:
212+
await f.write(await file.read())
207213

214+
if data_type == "text":
208215
docs = read_text_from_file(file, file_path)
209-
os.remove(file_path)
216+
elif data_type in ["audio", "video"]:
217+
docs = encode_file_to_base64(file_path)
218+
else:
219+
raise ValueError(f"Data type not recognized: {data_type}")
220+
221+
os.remove(file_path)
210222

211-
if isinstance(docs, list):
212-
file_summaries.extend(docs)
213-
else:
214-
file_summaries.append(docs)
223+
if isinstance(docs, list):
224+
file_summaries.extend(docs)
225+
else:
226+
file_summaries.append(docs)
215227

216228
if file_summaries:
217229
prompt = handle_message(chat_request.messages) + "\n".join(file_summaries)

DocSum/tests/test_compose_on_gaudi.sh

+28
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,20 @@ function validate_megaservice_multimedia() {
237237
"language=en" \
238238
"stream=False"
239239

240+
echo ">>> Checking audio data in form format, upload file"
241+
validate_service \
242+
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
243+
"well" \
244+
"docsum-gaudi-backend-server" \
245+
"docsum-gaudi-backend-server" \
246+
"media" "" \
247+
"type=audio" \
248+
"messages=" \
249+
"files=@$ROOT_FOLDER/data/test.wav" \
250+
"max_tokens=32" \
251+
"language=en" \
252+
"stream=False"
253+
240254
echo ">>> Checking video data in json format"
241255
validate_service \
242256
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
@@ -258,6 +272,20 @@ function validate_megaservice_multimedia() {
258272
"max_tokens=32" \
259273
"language=en" \
260274
"stream=False"
275+
276+
echo ">>> Checking video data in form format, upload file"
277+
validate_service \
278+
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
279+
"bye" \
280+
"docsum-gaudi-backend-server" \
281+
"docsum-gaudi-backend-server" \
282+
"media" "" \
283+
"type=video" \
284+
"messages=" \
285+
"files=@$ROOT_FOLDER/data/test.mp4" \
286+
"max_tokens=32" \
287+
"language=en" \
288+
"stream=False"
261289
}
262290

263291
function validate_megaservice_long_text() {

DocSum/tests/test_compose_on_xeon.sh

+28
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,20 @@ function validate_megaservice_multimedia() {
237237
"language=en" \
238238
"stream=False"
239239

240+
echo ">>> Checking audio data in form format, upload file"
241+
validate_service \
242+
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
243+
"well" \
244+
"docsum-xeon-backend-server" \
245+
"docsum-xeon-backend-server" \
246+
"media" "" \
247+
"type=audio" \
248+
"messages=" \
249+
"files=@$ROOT_FOLDER/data/test.wav" \
250+
"max_tokens=32" \
251+
"language=en" \
252+
"stream=False"
253+
240254
echo ">>> Checking video data in json format"
241255
validate_service \
242256
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
@@ -258,6 +272,20 @@ function validate_megaservice_multimedia() {
258272
"max_tokens=32" \
259273
"language=en" \
260274
"stream=False"
275+
276+
echo ">>> Checking video data in form format, upload file"
277+
validate_service \
278+
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
279+
"bye" \
280+
"docsum-xeon-backend-server" \
281+
"docsum-xeon-backend-server" \
282+
"media" "" \
283+
"type=video" \
284+
"messages=" \
285+
"files=@$ROOT_FOLDER/data/test.mp4" \
286+
"max_tokens=32" \
287+
"language=en" \
288+
"stream=False"
261289
}
262290

263291
function validate_megaservice_long_text() {

DocSum/tests/test_compose_tgi_on_gaudi.sh

+28
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,20 @@ function validate_megaservice_multimedia() {
229229
"language=en" \
230230
"stream=False"
231231

232+
echo ">>> Checking audio data in form format, upload file"
233+
validate_service \
234+
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
235+
"well" \
236+
"docsum-gaudi-backend-server" \
237+
"docsum-gaudi-backend-server" \
238+
"media" "" \
239+
"type=audio" \
240+
"messages=" \
241+
"files=@$ROOT_FOLDER/data/test.wav" \
242+
"max_tokens=32" \
243+
"language=en" \
244+
"stream=False"
245+
232246
echo ">>> Checking video data in json format"
233247
validate_service \
234248
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
@@ -250,6 +264,20 @@ function validate_megaservice_multimedia() {
250264
"max_tokens=32" \
251265
"language=en" \
252266
"stream=False"
267+
268+
echo ">>> Checking video data in form format, upload file"
269+
validate_service \
270+
"${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum" \
271+
"bye" \
272+
"docsum-gaudi-backend-server" \
273+
"docsum-gaudi-backend-server" \
274+
"media" "" \
275+
"type=video" \
276+
"messages=" \
277+
"files=@$ROOT_FOLDER/data/test.mp4" \
278+
"max_tokens=32" \
279+
"language=en" \
280+
"stream=False"
253281
}
254282

255283
function validate_megaservice_long_text() {

0 commit comments

Comments
 (0)