diff --git a/docs/agents/configuration/advanced-agent-settings.mdx b/docs/agents/configuration/advanced-agent-settings.mdx new file mode 100644 index 0000000..c15def8 --- /dev/null +++ b/docs/agents/configuration/advanced-agent-settings.mdx @@ -0,0 +1,1409 @@ +--- +title: 'Advanced Agent Settings' +description: 'Configure memory, verification, branching, and enterprise-grade agent features' +icon: 'microchip' +--- + +# Advanced Agent Settings + +Unlock enterprise-grade capabilities for your agents including smart memory systems, consensus-based verification, conversation branching, human oversight, and advanced observability. + + +**When to use Advanced Settings**: Most agents work great with basic configuration. Use advanced settings when you need: +- Learning from past interactions (Memory) +- High-stakes decisions requiring validation (Verification) +- Complex problem-solving with multiple approaches (Branching) +- Human oversight for critical operations (Human-in-the-Loop) +- Production debugging and monitoring (Observability) +- Optimized knowledge retrieval (Knowledge Advanced) + + +--- + +## Configuration Overview + +Advanced agent settings are organized into these categories: + + + + Learn from historical resolutions + + + + Optimize token usage and context + + + + Consensus-based validation + + + + Escalation and checkpoints + + + + Test multiple approaches + + + + Debugging and monitoring + + + + RAG optimization + + + + Fine-tune model behavior + + + +--- + +## Memory System + +Enable agents to learn from past interactions and improve over time by capturing and querying historical resolutions. + + + Enable smart memory system for learning from historical resolutions + + When enabled, agents capture significant moments from conversations and can query this memory bank to improve future responses. + + + + Types of markers to capture in memory + + **Options**: + - `error` - Error occurrences and resolutions + - `question` - Important questions and answers + - `escalation` - Escalation events and outcomes + - `tool_call` - Successful tool usage patterns + - `success` - Successful task completions + + **Example**: `["error", "question", "escalation", "tool_call", "success"]` + + Start with `["error", "success"]` to capture what works and what doesn't + + + + Enable querying memory bank for similar historical problems + + When enabled, agents automatically search memory for similar past situations before responding. + + + + Maximum number of memory entries to retain + + **Range**: 1-1000 + + Older entries beyond this limit are automatically pruned. Higher limits provide more historical context but may slow queries. + + + + Minimum similarity score for memory retrieval + + **Range**: 0.0-1.0 + + - `0.5` - Very loose matching (more recall, less precision) + - `0.7` - Balanced (default) + - `0.9` - Very strict matching (less recall, more precision) + + +### Memory Use Cases + + + + **Scenario**: Support agent handles repetitive issues + + **Configuration**: + ```json + { + "memory": { + "enabled": true, + "marker_types": ["question", "success"], + "max_memory_entries": 500, + "similarity_threshold": 0.75 + } + } + ``` + + **How it works**: + 1. Customer asks about password reset + 2. Agent searches memory: "Have we solved this before?" + 3. Finds 50 similar past resolutions + 4. Uses best-performing solution + 5. Stores this resolution for future reference + + + + **Scenario**: Agent encounters errors and learns fixes + + **Configuration**: + ```json + { + "memory": { + "enabled": true, + "marker_types": ["error", "success"], + "max_memory_entries": 200, + "similarity_threshold": 0.8 + } + } + ``` + + **How it works**: + 1. API call fails with timeout error + 2. Agent searches memory for similar errors + 3. Finds previous timeout was fixed by retry with backoff + 4. Applies same solution + 5. Stores pattern for future + + + + **Scenario**: Sales agent learns effective responses to objections + + **Configuration**: + ```json + { + "memory": { + "enabled": true, + "marker_types": ["question", "escalation", "success"], + "max_memory_entries": 300, + "similarity_threshold": 0.7 + } + } + ``` + + **How it works**: + 1. Lead raises pricing objection + 2. Agent searches memory for similar objections + 3. Finds successful past responses + 4. Adapts best approach to current context + 5. Stores outcome for learning + + + + +Memory is scoped per agent. Different agent instances don't share memory unless explicitly configured to use the same memory store. + + +--- + +## Smart Context Advanced + +Optimize token usage and context management with intelligent summarization and dynamic allocation. + + + Enable dynamic token allocation based on query complexity + + Automatically adjusts context window size based on the complexity of the query. Simple queries use fewer tokens, complex queries get more context. + + + + Enable automatic query complexity analysis + + Agent analyzes incoming queries to determine complexity level, which affects token allocation and summarization strategies. + + + + Method for detecting query complexity + + **Options**: + - `keyword` - Fast pattern matching (looks for complexity indicators) + - `llm` - AI-based analysis (more accurate but slower) + - `rubric` - Rule-based scoring system + + **Recommendation**: Use `keyword` for speed, `llm` for accuracy + + + + Enable summarizing multiple large messages into single summaries + + When conversation history grows large, agent automatically summarizes older messages to preserve context while reducing tokens. + + + + Enable ephemeral streams for context-relevant message queuing + + Creates temporary context windows containing only the most relevant parts of the conversation. Useful for very long conversations. + + + + Minimum messages before summarization is triggered + + **Range**: 2-20 + + Conversations shorter than this won't be summarized, preserving full context for brief interactions. + + +### Smart Context Strategies + + +```json Simple Tasks (Fast & Cheap) +{ + "smart_context_advanced": { + "token_limit_auto_adjust": true, + "complexity_detection": true, + "complexity_detection_method": "keyword", + "message_summarization": false, + "ephemeral_streams": false + } +} +``` + +```json Long Conversations (Memory Efficient) +{ + "smart_context_advanced": { + "token_limit_auto_adjust": true, + "complexity_detection": true, + "complexity_detection_method": "llm", + "message_summarization": true, + "ephemeral_streams": true, + "min_messages_for_summary": 3 + } +} +``` + +```json High-Stakes Analysis (Maximum Context) +{ + "smart_context_advanced": { + "token_limit_auto_adjust": true, + "complexity_detection": true, + "complexity_detection_method": "llm", + "message_summarization": false, + "ephemeral_streams": false + } +} +``` + + + +**Cost vs. Quality Trade-off**: +- Summarization reduces costs but may lose nuance +- Ephemeral streams are best for conversations with 20+ messages +- For critical decisions, disable summarization to preserve all context + + +--- + +## Verification & Consensus + +Run agents multiple times and require consensus before returning results. Ideal for high-stakes decisions. + + + Enable consensus-based verification + + Agent runs multiple times and results must agree before being accepted. Dramatically improves accuracy for critical decisions. + + + + Number of runs for consensus + + **Range**: 2-10 + + **Recommended**: + - `3` - Standard consensus (2 out of 3 must agree) + - `5` - High confidence (3 out of 5 must agree) + - `7+` - Mission critical (strong consensus required) + + More runs = more API calls = higher costs. Use judiciously. + + + + Agreement threshold for consensus validation + + **Range**: 0.5-1.0 + + - `0.5` - Simple majority (50%+1) + - `0.66` - Supermajority (2/3, default) + - `0.8` - Strong consensus (4/5) + - `1.0` - Unanimous agreement + + + + Specific agent to use for verification runs + + Use a different agent configuration for verification. For example, use GPT-4 to verify GPT-3.5 results. + + **Example**: `"specialist-verifier-agent"` + + +### Verification Examples + + + + **Why verification**: Life-critical decisions require high confidence + + **Configuration**: + ```json + { + "verification": { + "enabled": true, + "consensus_runs": 5, + "consensus_threshold": 0.8, + "verifier_agent": "medical-specialist-verifier" + } + } + ``` + + **Process**: + 1. Run diagnosis 5 times + 2. Require 4/5 agreement (80%) + 3. If consensus reached, return result + 4. If no consensus, escalate to human review + + + + **Why verification**: Costly errors must be prevented + + **Configuration**: + ```json + { + "verification": { + "enabled": true, + "consensus_runs": 3, + "consensus_threshold": 0.66, + "verifier_agent": "financial-auditor-agent" + } + } + ``` + + **Process**: + 1. Analyze transaction 3 times + 2. Require 2/3 agreement + 3. Different verifier agent double-checks + 4. Only proceed if consensus reached + + + + **Why verification**: Legal accuracy is critical + + **Configuration**: + ```json + { + "verification": { + "enabled": true, + "consensus_runs": 5, + "consensus_threshold": 1.0, + "verifier_agent": "legal-specialist-verifier" + } + } + ``` + + **Process**: + 1. Analyze document 5 times + 2. Require 100% agreement (unanimous) + 3. Any disagreement triggers human review + + + + +**When to use verification**: +- ✅ High-stakes decisions (legal, medical, financial) +- ✅ Accuracy more important than speed +- ✅ Errors are very costly +- ❌ Real-time responses required +- ❌ Budget constrained +- ❌ Low-stakes tasks + + +--- + +## Human-in-the-Loop + +Pause execution for human review at critical checkpoints or when conditions are met. + + + Enable human-in-the-loop escalation + + Agent can pause and request human approval before proceeding with certain actions. + + + + Automatically escalate to human when errors occur + + Any error during execution triggers immediate human review instead of agent attempting recovery. + + + + Specific checkpoints requiring human review + + **Options**: + - `pre_tool_call` - Before calling any tool (review action before execution) + - `post_tool_call` - After tool calls (review results before proceeding) + - `pre_response` - Before sending final response (review before delivery) + - `post_validation` - After validation completes (review validated output) + + **Example**: `["pre_tool_call", "pre_response"]` + + + + Conditions that trigger human escalation + + Define specific conditions that automatically pause for human review: + + **error_count** (number): Escalate after N consecutive errors + - Default: 3 + - Range: 1+ + + **confidence_score** (number): Escalate when confidence drops below threshold + - Default: 0.5 + - Range: 0.0-1.0 + + **turn_count** (number): Escalate after N turns without resolution + - Default: 10 + - Range: 1+ + + +### Human-in-the-Loop Configuration + + +```json Conservative (Pre-approve Everything) +{ + "human_in_the_loop": { + "enabled": true, + "auto_escalate_on_error": true, + "checkpoints": [ + "pre_tool_call", + "post_tool_call", + "pre_response", + "post_validation" + ], + "escalation_threshold": { + "error_count": 1, + "confidence_score": 0.7, + "turn_count": 5 + } + } +} +``` + +```json Balanced (Review Critical Actions) +{ + "human_in_the_loop": { + "enabled": true, + "auto_escalate_on_error": false, + "checkpoints": [ + "pre_tool_call", + "pre_response" + ], + "escalation_threshold": { + "error_count": 3, + "confidence_score": 0.5, + "turn_count": 10 + } + } +} +``` + +```json Minimal (Only When Stuck) +{ + "human_in_the_loop": { + "enabled": true, + "auto_escalate_on_error": false, + "checkpoints": [], + "escalation_threshold": { + "error_count": 5, + "confidence_score": 0.3, + "turn_count": 15 + } + } +} +``` + + +### Escalation Workflow + + + + Agent encounters escalation trigger (error threshold, checkpoint, low confidence) + + + + Flow pauses and agent state is saved + + + + Designated reviewers receive notification with context + + + + Reviewer examines agent state, conversation, and proposed action + + + + Reviewer approves, rejects, or provides guidance + + + + Flow continues based on reviewer decision + + + + +**Best Practice**: Combine with output schema to show reviewers structured data: +```json +{ + "output_schema": { + "action": "string", + "reasoning": "string", + "confidence": "number", + "risk_level": "string" + }, + "human_in_the_loop": { + "enabled": true, + "checkpoints": ["pre_response"] + } +} +``` +Reviewers see exactly what the agent plans to do and why. + + +--- + +## Branching & Checkpoints + +Test multiple approaches in parallel and maintain conversation checkpoints for rollback. + + + Enable conversation branching for testing different approaches + + Agent can create parallel conversation branches to explore multiple solutions simultaneously. + + + + Automatically create checkpoints at critical decision points + + Agent identifies key decision points and creates restore points automatically. + + + + Create checkpoint every N turns + + **Range**: 1 or more + + Regular checkpoints allow rollback if agent goes down wrong path. Lower values = more checkpoints = more storage. + + + + Maximum number of active conversation branches + + **Range**: 1-10 + + Limits parallel exploration to control costs. Higher values explore more solutions but use more API calls. + + +### Branching Strategies + + + + **Use case**: Complex problem with multiple possible approaches + + **Configuration**: + ```json + { + "branching": { + "enabled": true, + "auto_checkpoint": true, + "checkpoint_interval": 3, + "max_branches": 5 + } + } + ``` + + **How it works**: + 1. Agent identifies 3 possible approaches + 2. Creates 3 branches, explores each in parallel + 3. Compares results from all branches + 4. Returns best solution + 5. Discards unsuccessful branches + + + + **Use case**: Operations that might need rollback + + **Configuration**: + ```json + { + "branching": { + "enabled": true, + "auto_checkpoint": true, + "checkpoint_interval": 1, + "max_branches": 2 + } + } + ``` + + **How it works**: + 1. Checkpoint before risky operation + 2. Execute operation in branch + 3. Validate results + 4. If good: merge branch + 5. If bad: discard branch, restore checkpoint + + + + **Use case**: Test different response strategies + + **Configuration**: + ```json + { + "branching": { + "enabled": true, + "auto_checkpoint": false, + "checkpoint_interval": 10, + "max_branches": 3 + } + } + ``` + + **How it works**: + 1. Generate response in 3 different styles + 2. Evaluate each for quality metrics + 3. Select best performing style + 4. Use winner for actual response + + + + +Branching significantly increases API usage (N branches = N times the calls). Use sparingly for high-value tasks only. + + +--- + +## Observability & Debugging + +Production-grade observability, instrumentation, and debugging capabilities. + + + Enable detailed debug logging and instrumentation + + Captures verbose execution details. Essential for development and troubleshooting but adds overhead. + + + + Capture and log LLM reasoning processes + + Records the agent's internal reasoning and decision-making process. Invaluable for understanding behavior. + + + + Capture detailed tool call information and results + + Logs all tool invocations, parameters, responses, and errors. Critical for debugging tool issues. + + + + Enable performance metrics collection + + Tracks execution time, token usage, error rates, and other performance metrics. + + + + Enable distributed tracing with OpenTelemetry + + Creates distributed traces for debugging complex multi-agent flows. Integrates with standard observability tools. + + + + Logging level + + **Options**: + - `debug` - Everything (verbose, use for development) + - `info` - Important events (default for production) + - `warn` - Warnings and errors only + - `error` - Errors only (minimal logging) + + +### Observability Configurations + + +```json Development (Maximum Visibility) +{ + "observability": { + "debug_mode": true, + "capture_thinking": true, + "capture_tool_calls": true, + "metrics_enabled": true, + "tracing_enabled": true, + "log_level": "debug" + } +} +``` + +```json Production (Balanced) +{ + "observability": { + "debug_mode": false, + "capture_thinking": true, + "capture_tool_calls": true, + "metrics_enabled": true, + "tracing_enabled": true, + "log_level": "info" + } +} +``` + +```json Production (Minimal Overhead) +{ + "observability": { + "debug_mode": false, + "capture_thinking": false, + "capture_tool_calls": true, + "metrics_enabled": true, + "tracing_enabled": false, + "log_level": "warn" + } +} +``` + + +### Monitoring Dashboard + +When observability is enabled, access real-time metrics: + + + + - Average response time + - Token usage per request + - API call success/failure rate + - Cost per execution + + + + - Resolution rate + - Escalation frequency + - Tool usage patterns + - Error trends + + + + - End-to-end request traces + - Tool call waterfall + - Reasoning chain visualization + - Performance bottlenecks + + + + - Captured thinking process + - Tool call details + - Error stack traces + - State snapshots + + + + +**Integration**: Export traces to: +- DataDog +- New Relic +- Grafana +- Prometheus +- Custom OpenTelemetry collectors + + +--- + +## Knowledge Advanced + +Optimize RAG (Retrieval-Augmented Generation) performance with advanced knowledge retrieval. + + + Enable semantic indexing with llms.txt-style summaries + + Creates semantic indices over knowledge sources for faster, more accurate retrieval. + + + + Initial detail level for images + + **Options**: + - `low` - Fast, low-cost (good for most cases) + - `high` - Detailed analysis (more expensive) + - `auto` - Agent decides based on query + + + + Allow LLM to request higher resolution for specific image regions + + Agent can zoom into specific parts of images when needed, balancing cost with quality. + + + + Enable semantic search across entire knowledge buckets + + Search across all documents in a bucket using semantic similarity rather than keyword matching. + + + + Enable section-based retrieval from indexed documents + + Retrieve specific sections of documents rather than entire files, improving relevance and reducing tokens. + + + + Maximum tokens to retrieve from knowledge sources + + **Range**: 1 or more + + Limits total tokens retrieved from knowledge base per query. Higher values = more context but slower and more expensive. + + +### Knowledge Optimization Strategies + + +```json Fast Lookup (Minimal Tokens) +{ + "knowledge_advanced": { + "semantic_indexing": true, + "image_detail_level": "low", + "adaptive_image_resolution": false, + "bucket_semantic_search": true, + "section_based_retrieval": true, + "max_retrieval_tokens": 5000 + } +} +``` + +```json Balanced (Good Quality) +{ + "knowledge_advanced": { + "semantic_indexing": true, + "image_detail_level": "auto", + "adaptive_image_resolution": true, + "bucket_semantic_search": true, + "section_based_retrieval": true, + "max_retrieval_tokens": 10000 + } +} +``` + +```json Deep Analysis (Maximum Context) +{ + "knowledge_advanced": { + "semantic_indexing": true, + "image_detail_level": "high", + "adaptive_image_resolution": true, + "bucket_semantic_search": true, + "section_based_retrieval": false, + "max_retrieval_tokens": 50000 + } +} +``` + + +### RAG Performance Tips + + + + **Best practices**: + - Use clear section headers (H1, H2, H3) + - Keep sections focused on single topics + - Add descriptive metadata + - Use semantic markup (lists, tables, code blocks) + + **Why**: Section-based retrieval works best with well-structured documents + + + + **Guidelines**: + - Start with 5,000 tokens for simple Q&A + - Use 10,000 tokens for standard analysis + - Go to 20,000+ for comprehensive research + + **Monitor**: If agent frequently says "I don't have enough context", increase limit + + + + **Strategy**: + - Use `low` for charts, diagrams, screenshots + - Use `high` for text-heavy images (invoices, forms) + - Use `auto` when image content varies + - Enable adaptive resolution for large images + + **Cost**: High resolution costs three to five times more tokens than low + + + +--- + +## LLM Config Overrides + +Fine-tune model behavior per agent with runtime configuration overrides. + + + Sampling temperature override + + **Range**: 0-2 + + - `0` - Deterministic, focused + - `0.7` - Balanced (typical default) + - `1.5` - Creative, varied + + Overrides the default temperature for this specific agent. + + + + Maximum tokens override + + **Range**: 1+ + + Limits response length. Overrides default for this agent. + + + + Maximum turn count for LLM conversations + + **Range**: 1+ + + Limits how many back-and-forth exchanges the agent can have per execution. + + + + Enable streaming mode for LLM responses + + **Required for**: Claude Sonnet extended operations and long-running tasks + + Stream tokens as they're generated rather than waiting for complete response. Improves perceived latency. + + + +```json Deterministic Tasks +{ + "llm_config": { + "temperature": 0.0, + "max_tokens": 1000, + "max_turns": 5, + "stream": false + } +} +``` + +```json Creative Generation +{ + "llm_config": { + "temperature": 1.2, + "max_tokens": 4000, + "max_turns": 10, + "stream": true + } +} +``` + +```json Extended Operations +{ + "llm_config": { + "temperature": 0.7, + "max_tokens": 8192, + "max_turns": 20, + "stream": true + } +} +``` + + +--- + +## Complete Configuration Example + +Here's a fully configured agent using all advanced features: + +```json +{ + "name": "enterprise-support-agent", + "description": "Production-ready support agent with all safety features", + "llm_provider": "anthropic", + "model": "claude-3-5-sonnet-20241022", + + "llm_config": { + "temperature": 0.7, + "max_tokens": 8192, + "max_turns": 20, + "stream": true + }, + + "agent_settings": { + "memory": { + "enabled": true, + "marker_types": ["error", "question", "success"], + "query_enabled": true, + "max_memory_entries": 500, + "similarity_threshold": 0.75 + }, + + "smart_context_advanced": { + "token_limit_auto_adjust": true, + "complexity_detection": true, + "complexity_detection_method": "llm", + "message_summarization": true, + "ephemeral_streams": true, + "min_messages_for_summary": 3 + }, + + "verification": { + "enabled": true, + "consensus_runs": 3, + "consensus_threshold": 0.66, + "verifier_agent": "quality-check-agent" + }, + + "human_in_the_loop": { + "enabled": true, + "auto_escalate_on_error": true, + "checkpoints": ["pre_tool_call", "pre_response"], + "escalation_threshold": { + "error_count": 3, + "confidence_score": 0.6, + "turn_count": 10 + } + }, + + "branching": { + "enabled": true, + "auto_checkpoint": true, + "checkpoint_interval": 5, + "max_branches": 3 + }, + + "observability": { + "debug_mode": false, + "capture_thinking": true, + "capture_tool_calls": true, + "metrics_enabled": true, + "tracing_enabled": true, + "log_level": "info" + }, + + "knowledge_advanced": { + "semantic_indexing": true, + "image_detail_level": "auto", + "adaptive_image_resolution": true, + "bucket_semantic_search": true, + "section_based_retrieval": true, + "max_retrieval_tokens": 15000 + } + } +} +``` + +--- + +## Best Practices by Use Case + + + + **Recommended Settings**: + ```json + { + "memory": { + "enabled": true, + "marker_types": ["question", "success"], + "max_memory_entries": 500 + }, + "smart_context_advanced": { + "message_summarization": true, + "min_messages_for_summary": 3 + }, + "human_in_the_loop": { + "enabled": true, + "escalation_threshold": { + "error_count": 3, + "turn_count": 8 + } + }, + "observability": { + "capture_thinking": true, + "metrics_enabled": true + } + } + ``` + + **Why**: Learn from resolutions, handle long conversations, escalate when stuck, track performance + + + + **Recommended Settings**: + ```json + { + "verification": { + "enabled": true, + "consensus_runs": 5, + "consensus_threshold": 0.8 + }, + "human_in_the_loop": { + "enabled": true, + "checkpoints": ["pre_tool_call", "pre_response"], + "auto_escalate_on_error": true + }, + "observability": { + "debug_mode": true, + "capture_thinking": true, + "tracing_enabled": true, + "log_level": "debug" + } + } + ``` + + **Why**: High accuracy requirements, mandatory human oversight, audit trail, full traceability + + + + **Recommended Settings**: + ```json + { + "branching": { + "enabled": true, + "max_branches": 5 + }, + "smart_context_advanced": { + "complexity_detection": true, + "complexity_detection_method": "llm", + "token_limit_auto_adjust": true + }, + "knowledge_advanced": { + "semantic_indexing": true, + "section_based_retrieval": true, + "max_retrieval_tokens": 50000 + }, + "llm_config": { + "temperature": 0.3, + "max_tokens": 8192, + "max_turns": 20 + } + } + ``` + + **Why**: Explore multiple approaches, handle complex queries, deep knowledge retrieval, deterministic analysis + + + + **Recommended Settings**: + ```json + { + "memory": { + "enabled": true, + "marker_types": ["success"], + "max_memory_entries": 200 + }, + "smart_context_advanced": { + "complexity_detection": true, + "message_summarization": false + }, + "llm_config": { + "temperature": 1.0, + "max_tokens": 4000, + "stream": true + } + } + ``` + + **Why**: Learn successful patterns, preserve full context for creativity, higher temperature, streaming for UX + + + +--- + +## Performance Impact + +Understanding the cost and performance trade-offs of advanced features: + + + + **Impact**: Low + - Adds ~100ms query time + - Minimal token overhead + - Storage costs negligible + + **Recommendation**: Enable for most agents + + + + **Impact**: Medium + - Can reduce tokens by 30-50% + - Adds ~200ms processing time + - Complexity detection (LLM) adds API call + + **Recommendation**: Enable for long conversations + + + + **Impact**: High + - Multiplies API calls by N (consensus runs) + - Cost multiplies by run count (3 runs = 3x cost) + - Adds 2-5 second latency + + **Recommendation**: Only for critical decisions + + + + **Impact**: Variable + - No cost until triggered + - When triggered: flow pauses indefinitely + - Human response time: minutes to hours + + **Recommendation**: Set clear escalation criteria + + + + **Impact**: High + - Multiplies API calls by branch count + - Cost scales with branches + - Can reduce overall attempts if successful + + **Recommendation**: Use for complex problems only + + + + **Impact**: Low-Medium + - Debug mode: minor performance overhead + - Metrics: minimal overhead + - Tracing: small overhead + - Storage costs for logs + + **Recommendation**: Adjust by environment + + + + **Impact**: Medium + - Semantic indexing: one-time setup cost + - Section retrieval: reduces tokens by half or more + - High image detail: significantly more tokens vs low + + **Recommendation**: Tune max_retrieval_tokens + + + + **Impact**: Variable + - Streaming: Better UX, same cost + - Higher max_tokens: Higher cost per call + - Temperature: No cost impact + + **Recommendation**: Match to use case + + + +--- + +## Troubleshooting + + + + **Symptoms**: Agent doesn't seem to learn from past interactions + + **Possible causes**: + - Similarity threshold too high + - Not enough memory entries captured + - Wrong marker types selected + - Queries too dissimilar to stored memories + + **Solutions**: + 1. Lower `similarity_threshold` to 0.6-0.65 + 2. Increase `max_memory_entries` to 500+ + 3. Add more marker types (especially "success") + 4. Check memory dashboard to verify entries being captured + 5. Review captured memories for relevance + + + + **Symptoms**: Verification always fails, no consensus reached + + **Possible causes**: + - Threshold too high (requiring too much agreement) + - Question too ambiguous + - Temperature too high (too much randomness) + - Verifier agent not configured correctly + + **Solutions**: + 1. Lower `consensus_threshold` to 0.6 (60%) + 2. Reduce `temperature` to 0.3-0.5 for more consistency + 3. Make instructions more specific and deterministic + 4. Review individual run outputs to understand disagreements + 5. Try fewer `consensus_runs` (3 instead of 5) + + + + **Symptoms**: Flow constantly pausing for human review + + **Possible causes**: + - Escalation thresholds too low + - Too many checkpoints enabled + - Agent confidence consistently low + - auto_escalate_on_error with common errors + + **Solutions**: + 1. Increase `error_count` threshold (5 instead of 2) + 2. Lower `confidence_score` threshold (0.4 instead of 0.6) + 3. Remove unnecessary checkpoints + 4. Fix underlying errors instead of auto-escalating + 5. Improve agent instructions to boost confidence + + + + **Symptoms**: Unexpected API costs from parallel executions + + **Possible causes**: + - Too many max branches + - All branches exploring full paths + - Checkpoint interval too frequent + + **Solutions**: + 1. Reduce `max_branches` to 2-3 + 2. Implement early branch termination logic + 3. Increase `checkpoint_interval` to 10 + 4. Disable `auto_checkpoint` if not needed + 5. Use branching only for highest-value tasks + + + + **Symptoms**: Agent losing important context, responses less accurate + + **Possible causes**: + - `min_messages_for_summary` too low + - Summarization too aggressive + - Ephemeral streams discarding needed context + + **Solutions**: + 1. Increase `min_messages_for_summary` to 5-7 + 2. Disable `ephemeral_streams` for critical conversations + 3. Set `message_summarization: false` for high-stakes tasks + 4. Review summaries in observability logs + 5. Use `preserve_most_recent` to keep recent messages unsummarized + + + + **Symptoms**: Agent citing wrong documents or sections + + **Possible causes**: + - `max_retrieval_tokens` too low + - Semantic indexing not enabled + - Section-based retrieval too granular + - Documents poorly structured + + **Solutions**: + 1. Increase `max_retrieval_tokens` to 15,000-20,000 + 2. Enable `semantic_indexing` if disabled + 3. Disable `section_based_retrieval` to get full documents + 4. Improve document structure with clear headers + 5. Add more context in queries + + + +--- + +## Next Steps + + + + Learn core agent configuration + + + + Add MCP tools to agents + + + + Structure agent responses + + + + Chain multiple agents + + \ No newline at end of file diff --git a/docs/docs.json b/docs/docs.json index ff8e5c6..41eebab 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -84,7 +84,8 @@ "pages": [ "agents/configuration/information-settings", "agents/configuration/provider-settings", - "agents/configuration/context-settings" + "agents/configuration/context-settings", + "agents/configuration/advanced-agent-settings" ] }, "agents/tools-and-connectors",