Project Gene-Forge

A knowledge graph and AI-powered question-answering system for Marvel character genetic mutations and powers. This system combines a Neo4j knowledge graph with Google Gemini LLM integration to provide fact-grounded answers about characters, genes, powers, and team affiliations.

Overview

Project Gene-Forge is a S.H.I.E.L.D. intelligence system that:

Stores Marvel character data in a Neo4j knowledge graph
Uses deterministic query routing to extract facts from the graph
Generates natural language responses using Google Gemini LLM
Provides both a web interface and REST API for querying

Key Components

Neo4j Knowledge Graph: Stores characters, genes, powers, and teams with their relationships
Graph Query Engine: Routes natural language questions to Cypher queries
LLM Integration: Google Gemini API for generating fact-grounded responses
FastAPI Web Application: REST API and web UI for interactive queries
Response Caching: In-memory cache to reduce API calls and improve performance

Project Structure

├── app/
│   ├── api.py              # FastAPI application with REST endpoints
│   └── index.html          # Web UI for interactive queries
├── data/
│   ├── characters.json     # Character data (12 Marvel characters)
│   ├── gene_power_relationships.json  # Gene-power mappings
│   └── DATA_README.md      # Dataset documentation
├── graph/
│   ├── setup_graph.py      # Script to build Neo4j knowledge graph
│   └── GRAPH_SCHEMA.md     # Graph schema documentation
├── queries/
│   ├── graph_qa.py         # Graph query engine (entity resolution, intent classification)
│   └── graph_query_selection_layer.md  # Query routing documentation
├── llm_queries/
│   ├── llm_graph_qa.py     # Integrated LLM + Graph QA service
│   ├── llm_integration.py  # Gemini API integration
│   └── cache.py            # Response caching implementation
├── example_usage.py        # Example script demonstrating usage
└── requirements.txt        # Python dependencies

Prerequisites

Before setting up the project, ensure you have:

Python 3.8+ installed
- Check with: python --version or python3 --version
Neo4j Database (Community Edition or Desktop)
- Download from: https://neo4j.com/download/
- Neo4j Desktop is recommended for local development
- Ensure Neo4j is running before proceeding
Google Gemini API Key
- Get your API key from: https://aistudio.google.com/app/api-keys

Installation Steps

1. Install Python Dependencies

# Create a virtual environment (recommended)
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Install and Start Neo4j

Neo4j Desktop

Download Neo4j Desktop
Install and launch Neo4j Desktop
Create a new database (or use default)
Start the database
Note the connection details (URI, username, password)

3. Set Up Environment Variables

Create a .env file in the project root directory:

# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_neo4j_password_here
NEO4J_DATABASE=gene-forge

# Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-2.5-flash-lite

# Optional Configuration
LLM_TEMPERATURE=0.0
CACHE_TTL=86400
PORT=8000
HOST=127.0.0.1

Important: Replace your_neo4j_password_here with your actual Neo4j password and your_gemini_api_key_here with your Gemini API key.

Note: The project uses python-dotenv to load environment variables.

Configuration

Required Environment Variables

Variable	Description	Default
`NEO4J_URI`	Neo4j connection URI	`bolt://localhost:7687`
`NEO4J_USERNAME`	Neo4j username	`neo4j`
`NEO4J_PASSWORD`	Neo4j password	Required
`NEO4J_DATABASE`	Database name	`gene-forge`
`GEMINI_API_KEY`	Google Gemini API key	Required
`GEMINI_MODEL`	Gemini model name	`gemini-2.5-flash-lite`

Optional Environment Variables

Variable	Description	Default
`LLM_TEMPERATURE`	LLM temperature (0.0-1.0)	`0.0`
`CACHE_TTL`	Cache TTL in seconds	`86400` (24 hours)
`PORT`	FastAPI server port	`8000`
`HOST`	FastAPI server host	`127.0.0.1`

Setting Up the Knowledge Graph

Once Neo4j is running and environment variables are set, populate the knowledge graph:

Step 1: Verify Data Files

Ensure the data files exist:

data/characters.json - Contains 12 Marvel characters
data/gene_power_relationships.json - Contains gene-power mappings

Step 2: Run the Graph Setup Script

python graph/setup_graph.py

This script will:

Connect to Neo4j
Create constraints and indexes
Load character data
Create nodes (Characters, Genes, Powers, Teams)
Create relationships (MEMBER_OF, HAS_MUTATION, CONFERS, POSSESSES_POWER)
Display statistics

Expected Output:

INFO - Connected to Neo4j at bolt://localhost:7687 (database: gene-forge)
INFO - Loaded 12 characters from data/characters.json
INFO - Loaded 31 gene relationships from data/gene_power_relationships.json
INFO - Constraints and indexes created
INFO - Created 4 team nodes
INFO - Created 25 power nodes
INFO - Created 31 gene nodes
INFO - Created 12 character nodes
INFO - Created 12 MEMBER_OF relationships
INFO - Created 36 HAS_MUTATION relationships
INFO - Created 45 CONFERS relationships
INFO - Created 60 POSSESSES_POWER relationships
INFO - Graph construction completed!

Step 3: Verify the Graph

You can verify the graph was created successfully by:

Opening Neo4j Browser (usually at http://localhost:7474)

Running a test query:

MATCH (c:Character)-[:HAS_MUTATION]->(g:Gene)
RETURN c.name, g.gene_name
LIMIT 5

Running the Application

1) FastAPI Web Application (Recommended)

Start the web server:

python app/api.py

Or using uvicorn directly:

uvicorn app.api:app --host 127.0.0.1 --port 8000

Access the Application:

Web UI: http://localhost:8000
API Documentation: http://localhost:8000/docs
Health Check: http://localhost:8000/health

API Endpoints:

GET / - Web UI for interactive queries
POST /question - Answer questions using knowledge graph and LLM
GET /graph/{character} - Get character's graph neighbors

2) Python Script (CLI Usage)

Run the example script to see the system in action:

python example_usage.py

This demonstrates:

Basic graph querying (GraphQAEngine)
Direct LLM integration
Integrated service with caching

Additional Resources

Project Documentation

Dataset Documentation: See data/DATA_README.md for data format and structure
Graph Schema: See graph/GRAPH_SCHEMA.md for detailed schema documentation
Query Selection Layer: See queries/graph_query_selection_layer.md for query routing details

If you encounter issues not covered here:

Check the logs for error messages
Verify all prerequisites are installed
Ensure environment variables are set correctly
Review the example script (example_usage.py) for usage patterns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Gene-Forge

Overview

Key Components

Project Structure

Prerequisites

Installation Steps

1. Install Python Dependencies

2. Install and Start Neo4j

3. Set Up Environment Variables

Configuration

Required Environment Variables

Optional Environment Variables

Setting Up the Knowledge Graph

Step 1: Verify Data Files

Step 2: Run the Graph Setup Script

Step 3: Verify the Graph

Running the Application

1) FastAPI Web Application (Recommended)

2) Python Script (CLI Usage)

Additional Resources

Project Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
app		app
data		data
graph		graph
llm_queries		llm_queries
queries		queries
Graph-LLM Integration.md		Graph-LLM Integration.md
LICENSE		LICENSE
README.md		README.md
Sample Queries & Responses.md		Sample Queries & Responses.md
example_queries.py		example_queries.py
example_usage.py		example_usage.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Project Gene-Forge

Overview

Key Components

Project Structure

Prerequisites

Installation Steps

1. Install Python Dependencies

2. Install and Start Neo4j

3. Set Up Environment Variables

Configuration

Required Environment Variables

Optional Environment Variables

Setting Up the Knowledge Graph

Step 1: Verify Data Files

Step 2: Run the Graph Setup Script

Step 3: Verify the Graph

Running the Application

1) FastAPI Web Application (Recommended)

2) Python Script (CLI Usage)

Additional Resources

Project Documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages