Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions agents-manage-api/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,16 @@
"typecheck:watch": "tsc --noEmit --watch"
},
"dependencies": {
"@ai-sdk/anthropic": "^1.1.9",
"@ai-sdk/google": "^1.0.22",
"@ai-sdk/openai": "^1.0.19",
"@hono/node-server": "^1.14.3",
"@hono/swagger-ui": "^0.5.1",
"@hono/zod-openapi": "^1.0.2",
"@inkeep/agents-core": "workspace:^",
"@nangohq/node": "^0.69.5",
"@nangohq/types": "^0.69.5",
"ai": "^4.1.11",
"dotenv": "^17.2.1",
"drizzle-orm": "^0.44.4",
"hono": "^4.10.3",
Expand Down
112 changes: 112 additions & 0 deletions agents-manage-api/scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Evaluation Scripts

This directory contains utility scripts for running and testing evaluations.

## Available Scripts

### `run-conversation-evaluation.ts`

A script that demonstrates how to run conversation evaluations on an existing conversation.

**What it does:**

1. Connects to your existing database (no migrations needed)
2. Verifies an existing conversation exists in the database
3. Creates an evaluator using the `createEvaluator` API
4. Creates a conversation evaluation config using `createConversationEvaluationConfig` API
5. Links the evaluator to the config using `linkEvaluatorToConfig` API
6. Runs the evaluation using the EvaluationService
7. Displays the results with scores and reasoning

**Configuration:**

Edit the script to set your conversation ID and tenant/project:

```typescript
const EXISTING_CONVERSATION_ID = 'ukegsy5b0e02tc9fsbr0p';
const TENANT_ID = 'inkeep'; // Update with your tenant ID
const PROJECT_ID = 'default'; // Update with your project ID
```

**How to run:**

```bash
cd agents-manage-api
pnpm tsx scripts/run-conversation-evaluation.ts
```

**Environment Requirements:**

The script uses the environment variables from your `.env` file. Make sure you have:

- `ANTHROPIC_API_KEY` - Required for running evaluations with Claude
- `DB_FILE_NAME` - SQLite database file path (optional, uses in-memory by default)

**Output:**

The script will output:
- Tenant, project, and conversation IDs created
- Evaluation results including:
- Status (done/failed)
- Reasoning from the LLM
- Structured evaluation scores:
- Response Quality (1-5)
- Professionalism (1-5)
- Resolution Progress (1-5)
- Empathy (1-5)
- Overall Score
- Strengths identified
- Areas for improvement

**Example Output:**

```
================================================================================
EVALUATION RESULTS
================================================================================

Tenant ID: test-tenant-abc123
Project ID: default
Conversation ID: conv-xyz789
Evaluation Config ID: eval-config-def456

Total Results: 1
Duration: 3245ms

--------------------------------------------------------------------------------
Result ID: eval-result-001
Status: done

Reasoning:
The agent provided helpful, professional responses with good empathy...

Evaluation Scores:
{
"responseQuality": 4,
"professionalism": 5,
"resolution": 4,
"empathy": 4,
"overallScore": 4.25,
"strengths": [
"Polite and professional tone",
"Proactive in providing tracking information"
],
"areasForImprovement": [
"Could have offered additional assistance"
]
}
================================================================================
```

## Adding New Scripts

When adding new evaluation scripts:

1. Use TypeScript for type safety
2. Import from `@inkeep/agents-core` for database operations
3. Use the logger for structured logging
4. Include comprehensive error handling
5. Provide clear output formatting
6. Document environment requirements
7. Add usage instructions to this README

Loading
Loading