[FEATURE] Add `ContextualFaithfulnessEvaluator` for RAG evals

### Problem Statement

Currently there's no way to evaluate whether RAG responses are grounded in the retrieved context. The existing `FaithfulnessEvaluator` checks against conversation history, but RAG systems need validation against the actual context retrieved from vector stores.

### Proposed Solution

Add a `ContextualFaithfulnessEvaluator` that validates responses against a `retrieval_context` field on test `Case`s.

### Use Case

When using RAG, I need to detect hallucinations:
```python
case = Case(
    input="What is the refund policy?",
    retrieval_context=[
        "Refunds available within 30 days of purchase.",
        "Items must be unopened for full refund."
    ]
)
```
The evaluator would then score how grounded the response is in relation to the retrieval context defined in the `Case`.

### Alternatives Solutions

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Add `ContextualFaithfulnessEvaluator` for RAG evals #65

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Add ContextualFaithfulnessEvaluator for RAG evals #65

Description

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[FEATURE] Add `ContextualFaithfulnessEvaluator` for RAG evals #65