You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(client): route parse() via DWS Extract key and reject text+spatial
DWS Extract is a separate product from DWS Processor with its own API key
and credit pool. Calling /extraction/parse with the Processor key returns
403. Add an optional extract_api_key constructor parameter (str or async
callable) that parse() prefers over api_key when set; non-parse methods
keep using api_key. Falling back to api_key keeps a single-key setup
working once tenants get global DWS keys.
Also reject mode='text' + output_format='spatial' before the request goes
out — the text mode only produces markdown, so the combination would 502
on the server side. Surface it as a ValidationError with guidance.
Addresses PR #47 review feedback from HungKNguyen.
Copy file name to clipboardExpand all lines: README.md
+17-1Lines changed: 17 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,6 +94,15 @@ For a complete list of available methods with examples, see the [Methods Documen
94
94
**content-extraction workflows** where you need to feed document content into a
95
95
downstream pipeline rather than render or transform the document itself:
96
96
97
+
> **Heads up — separate API key.** DWS Extract is a different product from
98
+
> DWS Processor and has its own API key. Pass it as
99
+
> `NutrientClient(api_key=..., extract_api_key=...)`; the Extract key is
100
+
> used only for `parse()`, while every other method continues to use the
101
+
> Processor key. Using the Processor key against `/extraction/parse`
102
+
> returns `403`. If `extract_api_key` is omitted, `parse()` falls back to
103
+
> the main `api_key` — that path works once your tenant moves to global
104
+
> DWS API keys.
105
+
97
106
-**RAG (retrieval-augmented generation) pipelines** — pull a clean Markdown
98
107
representation of a document for chunking, embedding, and indexing in a
99
108
vector store.
@@ -114,14 +123,21 @@ downstream pipeline rather than render or transform the document itself:
114
123
|`markdown`| RAG, search indexing, content migration — anywhere structured text beats spatial data | One whole-document Markdown string at `response['output']['markdown']`|
115
124
|`spatial` (default) | Form/invoice extraction, layout reconstruction, flows that need per-element confidence | Flat list of typed elements at `response['output']['elements']`|
116
125
126
+
Spatial output requires an OCR-capable mode (`structure`, `understand`, or
127
+
`agentic`); `mode='text'` is markdown-only and the client rejects the
128
+
`text` + `spatial` combination before the request goes out.
129
+
117
130
### Quick start
118
131
119
132
```python
120
133
import asyncio
121
134
from nutrient_dws import NutrientClient
122
135
123
136
asyncdefmain():
124
-
client = NutrientClient(api_key='your_api_key')
137
+
client = NutrientClient(
138
+
api_key='your_processor_key',
139
+
extract_api_key='your_extract_key',
140
+
)
125
141
126
142
# Spatial elements (default) — paragraphs, tables, formulas, pictures, etc.
0 commit comments