graphify: optional Fabric delegation for the RAG pre-stage#1
Merged
Conversation
Adds graphify.fabric_url + fabric_key + fabric_rag_weights to
GraphifyConfig. When fabric_url is set, the augment/compress middleware
delegates retrieval to Fabric's /v1/rag instead of the embedded
pgvector + sentence-transformers backend. If unset, embedded behaviour
is unchanged so single-box deployments keep working.
On any Fabric error we log + fall back to embedded retrieval, so a
flaky Fabric host never fails a chat request. Default ranking weights
are pure cosine because Router asks Fabric for code-chunk relevance,
and Fabric's memo-search default of cosine 0.5 + tsvector 0.3 +
recency 0.2 is the wrong blend for that workload.
Adds:
- GraphifyConfig.{FabricURL,FabricKey,FabricRAGWeights,FabricTimeoutMS}
- graphify_fabric.go: fabricRetrieve() + FabricRAG{Request,Response}
- middleware: new retrieve() helper that branches Fabric vs embedded
- main.go: middleware now constructs in fabric-only mode when no local
embedder is available
- four new Prometheus counters: kronaxis_router_graphify_fabric_{calls,
fails,chunks,fallbacks}_total
- examples/fabric-integration.yaml: dedicated reference config
- inline docs in the main config.yaml graphify block
Companion PR: kronaxis-fabric#feature/platform-integration adds the
/v1/rag endpoint this targets.
Smoke-tested end-to-end against fabric v0.10 on DL580 (192.168.50.129:
8201) with two seeded chunks; router metrics show fabric_calls=1,
fabric_fails=0, fabric_chunks=2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds graphify.fabric_url + fabric_key + fabric_rag_weights to
GraphifyConfig. When fabric_url is set, the augment/compress
middleware delegates retrieval to Fabric's /v1/rag instead of the
embedded pgvector + sentence-transformers backend. If unset,
embedded behaviour is unchanged so single-box deployments keep
working.
flaky Fabric host never fails a chat request.
code-chunk relevance, and Fabric's memo-search default of
cosine 0.5 + tsvector 0.3 + recency 0.2 is the wrong blend for
that workload.
embedder is available
kronaxis_router_graphify_fabric_{calls,fails,chunks,fallbacks}_total
Companion PR (Fabric side):
Kronaxis/kronaxis-fabric#1
Test plan
- Router starts:
graphify enabled: mode=fabric-only fabric_url="..."- Chat completion logs:
graphify: delegating to fabric url=... top_k=3 weights=cosine:1.00,...thengraphify: fabric returned 2 chunks (~132 tokens)- Response headers include X-Kronaxis-Graphify: augment +
X-Kronaxis-Graphify-Chunks: 2
- /metrics shows graphify_fabric_calls_total=1, fails=0, chunks=2
- Backing LLM (qwen-coder-32b) used the chunk content in its reply
Out of scope
chunk-builder code can be retargeted to POST to Fabric's /v1/chunks)
the embedded-path tests still cover the fallback branch)
DO NOT merge yet -- operator review.