Ragbot - RAG-Powered Chatbot Example

This project demonstrates Retrieval-Augmented Generation (RAG) using Embabel Agent with Apache Lucene for vector storage and Spring Shell for interaction.

Getting Started

Prerequisites

API Key: Set at least one LLM provider API key as an environment variable:

# For OpenAI (GPT models)
export OPENAI_API_KEY=sk-...

# For Anthropic (Claude models)
export ANTHROPIC_API_KEY=sk-ant-...

The model configured in application.yml determines which key is required. The default configuration uses OpenAI.

Java: Java 21+ is required.

Quick Start

Set your API key (see above)
Run the shell:
```
./scripts/shell.sh
```
Ingest a document:
```
ingest
```
Start chatting:
```
chat
```

Usage

Run the shell script to start Embabel under Spring Shell:

./scripts/shell.sh

You can also run the main class, com.embabel.examples.ragbot.RagShellApplication, directly from your IDE.

Shell Commands

Command	Description
`ingest [url]`	Ingest a URL into the RAG store. Uses Apache Tika to parse content hierarchically and chunks it for vector storage. Default URL is the text of the recent Australia Social Media ban for under 16s. Documents are only ingested if they don't already exist.
`ingest-directory <path>`	Ingest all markdown (`.md`) and text (`.txt`) files from a directory recursively. Useful for loading preprocessed content from docling or other sources.
`zap`	Clear all documents from the Lucene index. Returns the count of deleted documents.
`chunks`	Display all stored chunks with their IDs and content. Useful for debugging what content has been indexed.
`chat`	Start an interactive chat session where you can ask questions about ingested content.
`uichat [port]`	Launch a web-based chat UI using Javelit. Opens at http://localhost:8888 by default. Use `uichat-stop` to stop.
`info`	Show Lucene store info: number of chunks, index size, etc.

Web Chat UI

The uichat command launches a browser-based chat interface built with Javelit:

Example Workflow

# Start the shell
./scripts/shell.sh

# Ingest a document
ingest https://example.com/document

# View what was indexed
chunks

# Chat with the RAG-powered assistant
chat
> What does this document say about X?

# Clear the index when done
zap

Troubleshooting

No Output in Chat

If you're not seeing LLM responses in the chat session, the output may be redirected to a log file. Set redirect-log-to-file to false in application.yml:

embabel:
  agent:
    shell:
      redirect-log-to-file: false

Alternatively, you can tail the log file in a separate terminal to see output:

tail -f logs/chat-session.log

Poor Quality RAG Results

First, check the state of the Lucene index by running the info command in the shell. If you do have content, run the chunks command to see what has been indexed.

If ingested content is present but you're seeing poor results, it may be that the source was in a format (like complex HTML) that didn't parse cleanly. Consider using docling to convert complex documents to clean markdown before ingestion.

You may also want to adjust chunking parameters in application.yml to better suit your content:

ragbot:
  chunker-config:
    max-chunk-size: 800      # Increase for longer chunks
    overlap-size: 100        # Increase for more context overlap

Implementation

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Spring Shell                                   │
│                                                                             │
│   > chat                                                                    │
│   > What penalties apply to social media platforms?                         │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                             AgentProcess                                    │
│                                                                             │
│   Starts when chat begins. Manages conversation state and action dispatch.  │
│   Listens for triggers (UserMessage) and invokes matching @Action methods.  │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │ UserMessage triggers
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     @Action: ChatActions.respond()                          │
│                                                                             │
│   Fired on each user message. Uses Ai interface to build request:           │
│     context.ai()                                                            │
│         .withLlm(...)                                                       │
│         .withReference(toolishRag)  ◄── ToolishRag added as LLM tool        │
│         .withTemplate("ragbot")                                             │
│         .respondWithSystemPrompt(conversation, ...)                         │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            Ai Interface                                     │
│                                                                             │
│   • Renders system prompt from Jinja template                               │
│   • Packages ToolishRag as tool definition for LLM                          │
│   • Sends request to LLM provider (OpenAI / Anthropic)                      │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         LLM (GPT / Claude)                                  │
│                                                                             │
│   Receives prompt + tool definitions. Decides to call tools as needed:      │
│                                                                             │
│   "I need to search for penalty information..."                             │
│         │                                                                   │
│         ▼                                                                   │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  Tool Call: vectorSearch("penalties social media platforms")        │   │
│   └─────────────────────────────────┬───────────────────────────────────┘   │
│                                     │                                       │
└─────────────────────────────────────┼───────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                    ToolishRag → LuceneSearchOperations                      │
│                                                                             │
│   • Converts query to embedding vector                                      │
│   • Searches ./.lucene-index for similar chunks                             │
│   • Returns relevant content to LLM                                         │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │
                                      ▼
                        LLM generates final response
                        using retrieved context
                                      │
                                      ▼
                           Response sent to user

Flow Summary:

User types chat → AgentProcess starts and manages the session
User sends a message → triggers @Action(trigger = UserMessage.class)
ChatActions.respond() builds request via Ai interface, adding ToolishRag with .withReference()
Ai packages prompt + tool definitions, sends to LLM
LLM decides to call a ToolishRag tool to search for relevant content
The ToolishRag tool queries Lucene index, returns matching chunks to LLM
LLM generates response using retrieved context → sent back to user
Loop continues for each new message until user exits

RAG Configuration

RAG is configured in RagConfiguration.java:

@Bean
LuceneSearchOperations luceneSearchOperations(
        ModelProvider modelProvider,
        RagbotProperties properties) {
    var embeddingService = modelProvider.getEmbeddingService(DefaultModelSelectionCriteria.INSTANCE);
    var luceneSearchOperations = LuceneSearchOperations
            .withName("docs")
            .withEmbeddingService(embeddingService)
            .withChunkerConfig(properties.chunkerConfig())
            .withIndexPath(Paths.get("./.lucene-index"))
            .buildAndLoadChunks();
    return luceneSearchOperations;
}

Key aspects:

Lucene with disk persistence: The vector index is stored at ./.lucene-index, surviving application restarts
Embedding service: Uses the configured ModelProvider to get an embedding service for vectorizing content
Configurable chunking: Content is split into chunks with configurable size (default 800 chars), overlap (default 50 chars), and optional section title inclusion

Chunking properties can be configured via application.yml:

ragbot:
  chunker-config:
    max-chunk-size: 800
    overlap-size: 100

Chatbot Creation

The chatbot is created in ChatConfiguration.java:

@Bean
Chatbot chatbot(AgentPlatform agentPlatform) {
    return AgentProcessChatbot.utilityFromPlatform(agentPlatform);
}

The AgentProcessChatbot.utilityFromPlatform() method creates a chatbot that automatically discovers all @Action methods in @EmbabelComponent classes. Any action with a matching trigger becomes eligible to be called when appropriate messages arrive.

Action Handling

Chat actions are defined in ChatActions.java:

@EmbabelComponent
public class ChatActions {

    private final ToolishRag toolishRag;
    private final RagbotProperties properties;

    public ChatActions(SearchOperations searchOperations, RagbotProperties properties) {
        this.toolishRag = new ToolishRag(
                "sources",
                "Sources for answering user questions",
                searchOperations);
        this.properties = properties;
    }

    @Action(canRerun = true, trigger = UserMessage.class)
    void respond(Conversation conversation, ActionContext context) {
        var assistantMessage = context.ai()
                .withLlm(properties.chatLlm())
                .withReference(toolishRag)
                .withTemplate("ragbot")
                .respondWithSystemPrompt(conversation, Map.of(
                        "properties", properties
                ));
        context.sendMessage(conversation.addMessage(assistantMessage));
    }
}

Key concepts:

@EmbabelComponent: Marks the class as containing agent actions that can be discovered by the platform
@Action annotation:
- trigger = UserMessage.class: This action is invoked whenever a UserMessage is received in the conversation
- canRerun = true: The action can be executed multiple times (for each user message)
ToolishRag as LLM reference:
- Wraps the SearchOperations (Lucene index) as a tool the LLM can use
- When .withReference(toolishRag) is called, the LLM can search the RAG store to find relevant content
- The LLM decides when to use this tool based on the user's question
Response flow:
- User sends a message (triggering the action)
- The action builds an AI request with the RAG reference
- The LLM may call the RAG tool to retrieve relevant chunks
- The LLM generates a response using retrieved context
- The response is added to the conversation and sent back

Prompt Templates

Chatbot prompts are managed using Jinja templates rather than inline strings. This is best practice for chatbots because:

Prompts grow complex: Chatbots require detailed system prompts covering persona, guardrails, objectives, and behavior guidelines
Separation of concerns: Prompt engineering can evolve independently from Java code
Reusability: Common elements (guardrails, personas) can be shared across different chatbot configurations
Configuration-driven: Switch personas or objectives via application.yml without code changes

Separating Voice from Objective

The template system separates two concerns:

Objective: What the chatbot should accomplish - the task-specific instructions and domain expertise (e.g., analyzing legal documents, answering technical questions)
Voice: How the chatbot should communicate - the persona, tone, and style of responses (e.g., formal lawyer, Shakespearean, sarcastic)

This separation allows mixing and matching. You could have a "legal" objective answered in the voice of Shakespeare, Monty Python, or a serious lawyer - without duplicating the legal analysis instructions in each persona template.

Template Structure

src/main/resources/prompts/
├── ragbot.jinja                    # Main template entry point
├── elements/
│   ├── guardrails.jinja            # Safety and content restrictions
│   └── personalization.jinja       # Dynamic persona/objective loader
├── personas/                       # HOW to communicate (voice/style)
│   ├── clause.jinja                # Serious legal expert
│   ├── shakespeare.jinja           # Elizabethan style
│   ├── monty_python.jinja          # Absurdist humor
│   └── ...
└── objectives/                     # WHAT to accomplish (task/domain)
    └── legal.jinja                 # Legal document analysis

How Templates Are Loaded

The main template ragbot.jinja composes the system prompt from reusable elements:

{% include "elements/guardrails.jinja" %}

{% include "elements/personalization.jinja" %}

The personalization.jinja template dynamically includes persona and objective based on configuration:

{% set persona_template = "personas/" ~ properties.voice().persona() ~ ".jinja" %}
{% include persona_template %}

{% set objective_template = "objectives/" ~ properties.objective() ~ ".jinja" %}
{% include objective_template %}

Invoking Templates from Code

Templates are invoked using .withTemplate() and passing bindings:

context.ai()
    .withLlm(properties.chatLlm())
    .withReference(toolishRag)
    .withTemplate("ragbot")
    .respondWithSystemPrompt(
            conversation, 
         Map.of(
            "properties", properties
    ));

The properties object (a Java record) is accessible in templates. Jinjava supports calling record accessor methods with properties.voice().persona() syntax for nested records.

To create a new persona, add a .jinja file to prompts/personas/ and reference it by name in application.yml. See Configuration Reference for all available settings.

Creating a Custom Objective and Persona

This section walks through creating a new chatbot configuration from scratch, using a film critic example.

Step 1: Create the Objective Template

The objective defines what the chatbot should accomplish. Create a new file at:

src/main/resources/prompts/objectives/discuss_films.jinja

Example content based on existing objectives:

Answer questions about classic cinema and film history in a clear and engaging manner.

The tools available to you access a curated collection of film reviews and criticism.
You must always use these tools to find answers, as your general knowledge will not extend to everything in the collection
and these tools allow you to find detailed analysis if you try hard enough.

Always back up your points with direct quotes from the film criticism sources.

You may find that the result from one tool call leads to a search for another tool,
e.g. a result mentioning "as discussed in the analysis of Citizen Kane..." might lead to a search for "Citizen Kane analysis".

DO NOT RELY ON GENERAL KNOWLEDGE unless you are certain a better answer is not in the provided sources.

Step 2: Create the Persona Template

The persona defines how the chatbot communicates. Create a new file at:

src/main/resources/prompts/personas/film_critic.jinja

Example content based on existing personas:

Your name is Cinephile.
You are a passionate film critic with deep knowledge of cinema history.
You want to share your love of films with others and help them appreciate the art of filmmaking.
You speak with enthusiasm about cinematography, direction, and storytelling.

Step 3: Update the Directory Structure

After creating the files, your prompts directory should look like:

src/main/resources/prompts/
├── ragbot.jinja
├── elements/
│   ├── guardrails.jinja
│   └── personalization.jinja
├── personas/
│   ├── clause.jinja
│   ├── music-guide.jinja
│   ├── film_critic.jinja          # NEW
│   └── ...
└── objectives/
    ├── legal.jinja
    ├── music.jinja
    ├── discuss_films.jinja         # NEW
    └── ...

Step 4: Update ChatActions with ToolishRag Description

The ToolishRag description in ChatActions.java helps the LLM understand what content is available. Update the constructor to describe your new content:

public ChatActions(
        SearchOperations searchOperations,
        RagbotProperties properties) {
    this.toolishRag = new ToolishRag(
            "sources",
            "Film reviews and criticism: Classic cinema analysis and reviews",  // Updated description
            searchOperations)
            .withHint(TryHyDE.usingConversationContext());
    this.properties = properties;
}

The description should briefly explain what content the RAG store contains, helping the LLM make better decisions about when and how to search.

Step 5: Ingest Your Content

Use the ingest-directory command to load a directory of markdown or text files:

# Start the shell
./scripts/shell.sh

# Ingest a directory of film reviews (markdown or text files)
ingest-directory /path/to/film-reviews

# Verify content was indexed
chunks

The ingest-directory command recursively processes all .md and .txt files in the specified directory, chunking them for vector storage.

Step 6: Configure application.yml

Finally, update your configuration to use the new objective and persona:

ragbot:
  voice:
    persona: film_critic           # References personas/film_critic.jinja
    max-words: 50

  objective: discuss_films         # References objectives/discuss_films.jinja

  chat-llm:
    model: gpt-4.1-mini
    temperature: 0.3               # Slightly creative for engaging film discussion

Complete Example Summary

File	Purpose
`prompts/objectives/discuss_films.jinja`	Defines the task: answering questions about films
`prompts/personas/film_critic.jinja`	Defines the voice: enthusiastic cinema expert
`ChatActions.java` (constructor)	Describes the RAG content for the LLM
`application.yml`	Wires everything together

Restart the application after making these changes:

Configuration Reference

All configuration is externalized in application.yml, allowing behavior changes without code modifications.

application.yml Reference

ragbot:
  # RAG chunking settings
  chunker-config:
    max-chunk-size: 800      # Maximum characters per chunk
    overlap-size: 100        # Overlap between chunks for context continuity

  # LLM model selection and hyperparameters
  chat-llm:
    model: gpt-4.1-mini      # Model to use for chat responses
    temperature: 0.0         # 0.0 = deterministic, higher = more creative

  # Voice controls HOW the chatbot communicates
  voice:
    persona: clause          # Which persona template to use (personas/*.jinja)
    max-words: 30            # Hint for response length

  # Objective controls WHAT the chatbot accomplishes
  objective: legal           # Which objective template to use (objectives/*.jinja)

embabel:
  agent:
    shell:
      # Redirect logging during chat sessions
      redirect-log-to-file: true

Logging During Chat Sessions

When redirect-log-to-file: true, console logging is redirected to a file during chat sessions, providing a cleaner chat experience. Logs are written to:

logs/chat-session.log

To monitor logs while chatting, open a separate terminal and tail the log file:

tail -f logs/chat-session.log

This is useful for debugging RAG retrieval, seeing which chunks are being returned, and monitoring LLM API calls.

Switching Personas and Models

To change the chatbot's personality, simply update the persona value:

ragbot:
  voice:
    persona: shakespeare     # Now responds in Elizabethan English

To use a different LLM:

ragbot:
  chat-llm:
    model: gpt-4.1           # Use the larger GPT-4.1 instead
    temperature: 0.7         # More creative responses

No code changes required - just restart the application.

Preprocessing with docling

Docling is a document conversion tool that excels at converting complex formats (PDF, Word, HTML, PowerPoint) to clean markdown. This is useful when source documents don't parse well with standard tools.

Note: Docling can be slow, especially for large or complex documents. Plan accordingly.

Installation

You'll need Python. It's good practice to set up a virtual environment first:

# Using venv (Python standard library)
python -m venv docling-env
source docling-env/bin/activate  # On Windows: docling-env\Scripts\activate

# Or using conda
conda create -n docling python=3.11
conda activate docling

Then install docling:

pip install docling

Usage

Convert a single file to markdown:

docling document.pdf --to md --output output_dir/

Convert all files in a directory:

docling input_dir/ --to md --output output_dir/

When to Use docling

PDF documents with complex layouts, tables, or images
HTML pages that don't parse cleanly with Tika
Word documents with formatting that needs preservation
Any document where the default ingestion produces poor chunking

After converting to markdown, use ingest-directory to load the cleaned content:

ingest-directory output_dir/

See the docling documentation for more options and advanced usage.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
.idea		.idea
.mvn/wrapper		.mvn/wrapper
.vscode		.vscode
data		data
docs		docs
images		images
scripts		scripts
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

License

embabel/ragbot

Folders and files

Latest commit

History

Repository files navigation

Ragbot - RAG-Powered Chatbot Example

Getting Started

Prerequisites

Quick Start

Usage

Shell Commands

Web Chat UI

Example Workflow

Troubleshooting

No Output in Chat

Poor Quality RAG Results

Implementation

Architecture Overview

RAG Configuration

Chatbot Creation

Action Handling

Prompt Templates

Separating Voice from Objective

Template Structure

How Templates Are Loaded

Invoking Templates from Code

Creating a Custom Objective and Persona

Step 1: Create the Objective Template

Step 2: Create the Persona Template

Step 3: Update the Directory Structure

Step 4: Update ChatActions with ToolishRag Description

Step 5: Ingest Your Content

Step 6: Configure application.yml

Complete Example Summary

Configuration Reference

application.yml Reference

Logging During Chat Sessions

Switching Personas and Models

Preprocessing with docling

Installation

Usage

When to Use docling

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages