A knowledge graph and AI-powered question-answering system for Marvel character genetic mutations and powers. This system combines a Neo4j knowledge graph with Google Gemini LLM integration to provide fact-grounded answers about characters, genes, powers, and team affiliations.
Project Gene-Forge is a S.H.I.E.L.D. intelligence system that:
- Stores Marvel character data in a Neo4j knowledge graph
- Uses deterministic query routing to extract facts from the graph
- Generates natural language responses using Google Gemini LLM
- Provides both a web interface and REST API for querying
- Neo4j Knowledge Graph: Stores characters, genes, powers, and teams with their relationships
- Graph Query Engine: Routes natural language questions to Cypher queries
- LLM Integration: Google Gemini API for generating fact-grounded responses
- FastAPI Web Application: REST API and web UI for interactive queries
- Response Caching: In-memory cache to reduce API calls and improve performance
├── app/
│ ├── api.py # FastAPI application with REST endpoints
│ └── index.html # Web UI for interactive queries
├── data/
│ ├── characters.json # Character data (12 Marvel characters)
│ ├── gene_power_relationships.json # Gene-power mappings
│ └── DATA_README.md # Dataset documentation
├── graph/
│ ├── setup_graph.py # Script to build Neo4j knowledge graph
│ └── GRAPH_SCHEMA.md # Graph schema documentation
├── queries/
│ ├── graph_qa.py # Graph query engine (entity resolution, intent classification)
│ └── graph_query_selection_layer.md # Query routing documentation
├── llm_queries/
│ ├── llm_graph_qa.py # Integrated LLM + Graph QA service
│ ├── llm_integration.py # Gemini API integration
│ └── cache.py # Response caching implementation
├── example_usage.py # Example script demonstrating usage
└── requirements.txt # Python dependencies
Before setting up the project, ensure you have:
-
Python 3.8+ installed
- Check with:
python --versionorpython3 --version
- Check with:
-
Neo4j Database (Community Edition or Desktop)
- Download from: https://neo4j.com/download/
- Neo4j Desktop is recommended for local development
- Ensure Neo4j is running before proceeding
-
Google Gemini API Key
- Get your API key from: https://aistudio.google.com/app/api-keys
# Create a virtual environment (recommended)
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtNeo4j Desktop
- Download Neo4j Desktop
- Install and launch Neo4j Desktop
- Create a new database (or use default)
- Start the database
- Note the connection details (URI, username, password)
Create a .env file in the project root directory:
# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_neo4j_password_here
NEO4J_DATABASE=gene-forge
# Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-2.5-flash-lite
# Optional Configuration
LLM_TEMPERATURE=0.0
CACHE_TTL=86400
PORT=8000
HOST=127.0.0.1Important: Replace your_neo4j_password_here with your actual Neo4j password and your_gemini_api_key_here with your Gemini API key.
Note: The project uses python-dotenv to load environment variables.
| Variable | Description | Default |
|---|---|---|
NEO4J_URI |
Neo4j connection URI | bolt://localhost:7687 |
NEO4J_USERNAME |
Neo4j username | neo4j |
NEO4J_PASSWORD |
Neo4j password | Required |
NEO4J_DATABASE |
Database name | gene-forge |
GEMINI_API_KEY |
Google Gemini API key | Required |
GEMINI_MODEL |
Gemini model name | gemini-2.5-flash-lite |
| Variable | Description | Default |
|---|---|---|
LLM_TEMPERATURE |
LLM temperature (0.0-1.0) | 0.0 |
CACHE_TTL |
Cache TTL in seconds | 86400 (24 hours) |
PORT |
FastAPI server port | 8000 |
HOST |
FastAPI server host | 127.0.0.1 |
Once Neo4j is running and environment variables are set, populate the knowledge graph:
Ensure the data files exist:
data/characters.json- Contains 12 Marvel charactersdata/gene_power_relationships.json- Contains gene-power mappings
python graph/setup_graph.pyThis script will:
- Connect to Neo4j
- Create constraints and indexes
- Load character data
- Create nodes (Characters, Genes, Powers, Teams)
- Create relationships (MEMBER_OF, HAS_MUTATION, CONFERS, POSSESSES_POWER)
- Display statistics
Expected Output:
INFO - Connected to Neo4j at bolt://localhost:7687 (database: gene-forge)
INFO - Loaded 12 characters from data/characters.json
INFO - Loaded 31 gene relationships from data/gene_power_relationships.json
INFO - Constraints and indexes created
INFO - Created 4 team nodes
INFO - Created 25 power nodes
INFO - Created 31 gene nodes
INFO - Created 12 character nodes
INFO - Created 12 MEMBER_OF relationships
INFO - Created 36 HAS_MUTATION relationships
INFO - Created 45 CONFERS relationships
INFO - Created 60 POSSESSES_POWER relationships
INFO - Graph construction completed!
You can verify the graph was created successfully by:
- Opening Neo4j Browser (usually at http://localhost:7474)
- Running a test query:
MATCH (c:Character)-[:HAS_MUTATION]->(g:Gene) RETURN c.name, g.gene_name LIMIT 5
Start the web server:
python app/api.pyOr using uvicorn directly:
uvicorn app.api:app --host 127.0.0.1 --port 8000Access the Application:
- Web UI: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
API Endpoints:
GET /- Web UI for interactive queriesPOST /question- Answer questions using knowledge graph and LLMGET /graph/{character}- Get character's graph neighbors
Run the example script to see the system in action:
python example_usage.pyThis demonstrates:
- Basic graph querying (GraphQAEngine)
- Direct LLM integration
- Integrated service with caching
- Dataset Documentation: See
data/DATA_README.mdfor data format and structure - Graph Schema: See
graph/GRAPH_SCHEMA.mdfor detailed schema documentation - Query Selection Layer: See
queries/graph_query_selection_layer.mdfor query routing details
If you encounter issues not covered here:
- Check the logs for error messages
- Verify all prerequisites are installed
- Ensure environment variables are set correctly
- Review the example script (
example_usage.py) for usage patterns