Benchmark: Implement Standardized CRAG Benchmark Suite

# Benchmark Strategy: Obsidian Vault Intelligence

## Goal
Implement a **Standardized RAG Benchmark** that replicates the "Benchmark B: Unified Corpus" methodology from `obsidian-sonar`. This allows direct performance comparison (Accuracy, Retrieval Quality, Latency) using the same dataset (CRAG).

## Methodology: "Virtual Vault" Benchmarking
To test against the **CRAG Unified Corpus** (~60,000 documents) without flooding the user's actual Obsidian Vault with thousands of markdown files, we will implement a **Virtual Indexing** strategy.

### 1. Data Source (CRAG)
The benchmark requires two standardized files (compatible with `obsidian-sonar` format):
-   `corpus.jsonl`: ~60k entries. each line `{ "url": "...", "content": "..." }`.
-   `queries.jsonl`: 100+ sampled queries. `{ "query": "...", "answer": "...", "gold_urls": [...] }`.

> **Note**: We will provide a script or instruction to download/generate these using the official CRAG scripts, ensuring 1:1 data parity.

### 2. Implementation: `BenchmarkService`

#### A. Virtual Indexing (Ingestion)
Instead of creating real `TFile`s, the `BenchmarkService` will:
1.  Read `corpus.jsonl` stream.
2.  Feed the content directly into `GraphService` / `IndexerWorker` using **Virtual Paths** (e.g., `benchmark/crag/doc_123`).
3.  The `IndexerWorker` will treat these as valid indexed nodes, creating embeddings and keyword indices in the vector database.
4.  *Optimization*: This "Benchmarking Index" should strictly be **temporary** or separate from the main vault index to avoid polluting the user's personal graph. We will implement a `GraphService.switchToBenchmarkMode()` for this.

#### B. Execution Loop
For each query in `queries.jsonl`:
1.  **Retrieval**: Call `SearchOrchestrator.search(query)`.
2.  **Scoring (Retrieval)**:
    -   Check if the retrieved virtual paths match the `gold_urls` from the dataset.
    -   Calculate **Recall@5**, **Recall@10**, **MRR**.
3.  **Generation (End-to-End)** (Optional/Phase 2):
    -   Send retrieved context to `GeminiService`.
    -   Generate Answer.
    -   Compare with Ground Truth using LLM-as-a-Judge (Accuracy).

### 3. Reporting
Generate a `Benchmark_Results.md` report that matches the `obsidian-sonar` format for direct comparison:

| Metric | Vault Intelligence | Obsidian Sonar (Ref) | Diff |
| :--- | :--- | :--- | :--- |
| **Retrieval Recall@5** | [Result] | N/A | - |
| **Indexing Time** | [Time] | 6,245s | - |
| **Query Latency** | [Time] | 33.5s | - |

## Proposed Changes

### [NEW] `src/services/BenchmarkService.ts`
-   `loadCorpus(path: string)`
-   `runCRAGBenchmark()`
-   `calculateMetrics()`

### [MODIFY] `src/services/GraphService.ts`
-   Add `insertVirtualFile(path, content)`: Allow indexing content without a physical `TFile`.
-   Add `resetIndex(namespace)`: Ability to clear/switch indices.

### [MODIFY] `src/workers/indexer.worker.ts`
-   Ensure metadata/mtime checks can handle virtual inputs (timestamp = 0).

## User Review Required
-   **Resource Intensity**: Indexing 60k documents (even virtually) takes significant time and RAM.
-   **Storage**: The vector store will grow significantly. We must ensure we can **cleanly wipe** the benchmark data afterwards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark: Implement Standardized CRAG Benchmark Suite #179

Benchmark Strategy: Obsidian Vault Intelligence

Goal

Methodology: "Virtual Vault" Benchmarking

1. Data Source (CRAG)

2. Implementation: `BenchmarkService`

A. Virtual Indexing (Ingestion)

B. Execution Loop

3. Reporting

Proposed Changes

[NEW] `src/services/BenchmarkService.ts`

[MODIFY] `src/services/GraphService.ts`

[MODIFY] `src/workers/indexer.worker.ts`

User Review Required

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	Vault Intelligence	Obsidian Sonar (Ref)	Diff
Retrieval Recall@5	[Result]	N/A	-
Indexing Time	[Time]	6,245s	-
Query Latency	[Time]	33.5s	-

Uh oh!

Benchmark: Implement Standardized CRAG Benchmark Suite #179

Description

Benchmark Strategy: Obsidian Vault Intelligence

Goal

Methodology: "Virtual Vault" Benchmarking

1. Data Source (CRAG)

2. Implementation: BenchmarkService

A. Virtual Indexing (Ingestion)

B. Execution Loop

3. Reporting

Proposed Changes

[NEW] src/services/BenchmarkService.ts

[MODIFY] src/services/GraphService.ts

[MODIFY] src/workers/indexer.worker.ts

User Review Required

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

2. Implementation: `BenchmarkService`

[NEW] `src/services/BenchmarkService.ts`

[MODIFY] `src/services/GraphService.ts`

[MODIFY] `src/workers/indexer.worker.ts`