-
Notifications
You must be signed in to change notification settings - Fork 5.4k
feat: Authentication [DO NOT MERGE] #6107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This comment was marked as outdated.
This comment was marked as outdated.
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
|
@claude full review |
|
Claude finished @standujar's task —— View job Full PR Review for Authentication, Entity-level RLS & Security ImprovementsAfter a comprehensive review of this major security enhancement PR, here are my findings and recommendations: Review Checklist
🟢 Overall Assessment: Excellent ImplementationThis PR demonstrates exceptional engineering quality with comprehensive security improvements, thorough testing, and excellent documentation. The implementation follows security best practices and maintains backward compatibility. 🔒 Security AnalysisStrengths
Security Issues Identified🚨 Critical: Missing Rate LimitingLocation: The auth endpoints lack rate limiting, making them vulnerable to brute force attacks: // Missing rate limiting middleware before JWT auth
this.app.use('/api', jwtAuthMiddleware);Recommendation: Add rate limiting middleware before JWT authentication: import rateLimit from 'express-rate-limit';
const authRateLimit = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // limit each IP to 100 requests per windowMs
message: 'Too many authentication attempts, please try again later.'
});
this.app.use('/api/auth', authRateLimit);
this.app.use('/api', jwtAuthMiddleware);
|
0xbbjoker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #6107 Analysis - Breaking Changes & Discussion Points
Link: #6107
This PR is massive and should be split into smaller, reviewable chunks. Key concerns:
1. User Table - Do we need it?
- Adds
userstable with email/password auth (bcrypt hashing) - Conflict risk: Will clash with cloud schema design
- Question: Is this core functionality or should auth be pluggable/external (Privy, Auth0, CDP)?
2. JWT Implementation - Why JWT?
- Adds full JWT auth stack (
joselibrary, refresh tokens, 7-day expiry) - Routes:
/api/auth/register,/api/auth/login,/api/auth/refresh,/api/auth/me - Only enabled when
ENABLE_DATA_ISOLATION=true - Question: Should this be core or let end-users implement their own auth provider? Makes ElizaOS more opinionated.
3. RLS Implementation - This we need
- Two-layer security: Server RLS (multi-tenant) + Entity RLS (user isolation)
- Server RLS: Isolates different ElizaOS instances via
server_idcolumn - Entity RLS: Isolates user data via
entity_id(DM privacy, multi-user) - Cloud-ready: This is valuable for cloud deployments
- Recommendation: Extract RLS-only changes into separate PR
4. Breaking API Changes
- New domain-based routing:
/api/messaging,/api/agents,/api/memory,/api/audio,/api/auth - Removed:
/worldroutes (moved to/messaging/spaces) - Sessions API: New unified messaging interface (
/api/messaging/sessions,/api/messaging/jobs) - Impact: All existing API consumers will break
- Need: Migration guide + deprecation period
Proposed Breakdown
- PR 1: RLS infrastructure only (server + entity isolation) - ship this for cloud
- PR 2: API restructuring (domain routing, sessions API) - with migration docs
- PR 3: Auth system (JWT + user table) - needs architecture discussion first
Critical Questions
- Do we want opinionated auth or remain auth-agnostic?
- Can we align user table schema with planned cloud schema?
Thanks a lot for the detailed review !! I fully agree that this needs to be split. About the JWT/User part: About the API changes: Next steps: |
…ity isolation features
…nt for DM channel management
…clarity and consistency in messaging functionality
…o maintain backward compatibility
… conventions, ensuring backward compatibility with messageServerId
…or entities and servers
…ionality across the codebase for improved clarity and consistency in messaging operations
…ved clarity and consistency
…server_agents' and adjust related identifiers for consistency in migration tests
…egration tests to verify access control
…proper data removal and handle potential errors
…luding user registration, login, and retrieval methods
…onsistency across SQL plugin
- Implemented `isRoomParticipant` method in the `DatabaseAdapter` interface and its SQL implementation to efficiently check if an entity is a participant in a room. - Added `isChannelParticipant` method in the `BaseDrizzleAdapter` for channel participant verification. - Updated relevant tests to validate the new participant check functionalities. - Enhanced authentication middleware to support JWT and API key checks for secure access to channels and rooms.
… messaging channels and core
…ed dependencies and add some tests
…te token generation to async
… and protected features, including session management and API mocking
- Reorganize tests into unit/ and integration/ folders - Add SocketIOClientFixture and JWTTestHelper utilities - Fix JWKS verifier test to use manual fetch mock (bun-fetch-mock incompatible) - Fix SocketIO auth test for DATA_ISOLATION=false case - Add PostgreSQL detection to migrations (skip on SQLite) - Rename bootstrap-autoload.test.ts → plugin-auto-injection.test.ts - Rename middleware-chain.test.ts → auth-middleware-chain.test.ts - Remove jobs-message-flow.test.ts (merged into other integration tests)
797ca19 to
c8c9105
Compare
…ndant test categories
Summary
This PR implements four major improvements to ElizaOS's security and data architecture:
serverIdtomessageServerIdfor clarityisRoomParticipant,isChannelParticipant)All changes maintain full backward compatibility with existing deployments and plugins.
Table of Contents
Architecture Overview
1. Entity-Level Row Level Security (RLS)
Problem: ElizaOS needed fine-grained access control to isolate data by entity (users, agents, bots, etc.) within the same database.
Solution: Implemented PostgreSQL RLS policies that automatically filter data based on the current entity context
Benefits:
Implementation:
current_entity_id()PostgreSQL function to track current entity context viaapp.entity_idsession variableadd_entity_isolation()function to apply RLS policies to tablesentityIdorauthorIdcolumnsroomIdthat join toparticipantstable2. Server-Level Row Level Security (RLS)
Problem: ElizaOS needed multi-tenant isolation to prevent data leakage between different server instances (deployments, environments).
Solution: Implemented PostgreSQL RLS policies that automatically isolate data by ElizaOS server instance.
Already implemented: #6101
3. Semantic Clarity:
serverIdvsmessageServerIdProblem Statement
Why
serverIdwas problematicThe term
serverIdwas ambiguous and created confusion in the codebase for multiple reasons:1. Semantic Ambiguity
The name
serverIddoesn't clearly indicate what type of server it refers to. In a distributed system like ElizaOS, "server" could mean:This ambiguity made code harder to read and maintain.
2. Conflict with Row Level Security (RLS)
ElizaOS uses PostgreSQL Row Level Security for multi-tenant isolation. In this context:
server_idin RLS refers to the ElizaOS server instance (for tenant isolation)serverIdin messaging refers to external message platforms (Discord guild, Telegram bot, etc.)Key distinction:
server_id = "abc-123") can connect to MULTIPLE message serversmessageServerId = "discord-1",messageServerId = "discord-2")messageServerId = "telegram-1")This dual meaning created confusion:
3. Developer Confusion
When working on features involving both RLS and messaging:
server_id(tenant isolation)serverId(message platform)Same name, completely different concepts → bugs and confusion
4. API Inconsistency
API routes like
/api/agents/:agentId/servers/:serverId/channelsdidn't clearly communicate thatserverIdrefers to a messaging platform, not an ElizaOS server.Solution
Rename message-related
serverIdtomessageServerIdto:server_idfor tenant isolation/api/agents/:agentId/message-servers/:messageServerId/channelsare crystal clear4. JWT Authentication - Complete Implementation
Problem: ElizaOS needed secure authentication for WebSocket (SocketIO) and REST API connections, with proper entity isolation.
Solution: Implemented comprehensive JWT authentication infrastructure with multi-provider support.
Key Features Implemented
Universal JWT Verifier (
middleware/jwt-verifier.ts)stringToUuid(jwt.sub)Authentication Endpoints (
api/auth/credentials.ts)POST /api/auth/register- User registration with email/passwordPOST /api/auth/login- User authenticationPOST /api/auth/refresh- Token refreshGET /api/auth/me- Current user infoDatabase Integration
userstable schema with proper indexesgetUserByEmail(email: string): Promise<User | null>getUserByUsername(username: string): Promise<User | null>getUserById(id: UUID): Promise<User | null>createUser(user: User): Promise<User>updateUserLastLogin(userId: UUID): Promise<void>Middleware Chain
Layer 1: API Key middleware (
middleware/api-key.ts)X-API-KEYheaderELIZA_SERVER_AUTH_TOKENconfiguredLayer 2: JWT middleware (
middleware/jwt.ts)req.userIdfor downstream handlersENABLE_DATA_ISOLATION=trueWebSocket Authentication
ENABLE_DATA_ISOLATION=false)Multi-Provider Architecture
Supports two authentication modes simultaneously:
External Provider Mode (JWKS)
userstable requiredCustom Auth Mode (Shared Secret)
Entity Creation Strategy
ensureConnection()message.ts)Two-Layer Security Model
Layer 1: Authentication (JWT Validation)
Layer 2: Authorization (Participant Checking)
isChannelParticipant()database check5. Performance Optimization: Participant Checking
Problem: Checking if an entity is a participant required loading ALL participants into memory and using
.some()- O(n) complexity.Solution: Added direct database existence checks - O(1) complexity.
New Methods:
isRoomParticipant(entityId, roomId)- Direct DB queryisChannelParticipant(entityId, channelId)- Direct DB queryBenefits:
6. RLS Security for Junction Table
Problem Identified: Without RLS on
message_server_agents, Server A could see the existence of Discord/Telegram servers linked to Server B's agents.Solution: The RLS system automatically adds isolation to the junction table:
server_id UUID DEFAULT current_server_id()columnserver_isolation_policyfor complete isolationTest Coverage
All Tests Passing ✅
RLS Tests: 77 tests pass, 0 fail, 173 expect() calls
Participant Tests: 11 tests pass, 0 fail, 27 expect() calls
participant.test.ts(3 new forisRoomParticipant)messaging.test.ts(2 new forisChannelParticipant)JWT Tests: 15+ tests pass
Test Files
Unit Tests - Entity RLS (
entity-rls.test.ts)roomId>entityId>authorId)Integration Tests - Entity RLS (
rls-entity.test.ts)Integration Tests - message_server_agents (
rls-message-server-agents.test.ts)server_idautomatically set viaDEFAULT current_server_id()JWT Authentication Tests (
jwt-middleware.test.ts,auth-flow.test.ts)Room Integration Tests (5/5 passing)
should map messageServerId to serverId for backward compatibilityBreaking Changes
None. All changes are fully backward compatible.
Benefits
Code Clarity
messageServerIdrefers toisChannelParticipant()vs generic checksSecurity
Developer Experience
server_idand messagingserverIdAuthentication & Authorization
Testing
Database Migration
Automatic Migration System
The migration system automatically handles:
server_agents→message_server_agentsserver_id→message_server_idin junction tableuserstable with proper schema and indexesserver_idcolumn withDEFAULT current_server_id()to all tablesDeveloper experience:
RLS Architecture Details
Three-Layer Security Model
Layer 1: Server RLS (Multi-Tenant Isolation)
server_idfor isolationapp.server_idtransaction-local variableLayer 2: Entity RLS (User Privacy Isolation)
entityId,authorId, or joins viaparticipantstableapp.entity_idtransaction-local variableLayer 3: Application Layer (JWT)
All three layers stack - a user can only see data from their server AND their accessible entities AND that they have permission to access.
Excluded Tables (with rationale)
Entity RLS Exclusions:
users- Authentication table (no entity isolation needed)entity_mappings- Cross-platform entity mappingdrizzle_migrations,__drizzle_migrations- Migration trackingagents- Shared across entitiesowners- RLS management tableAll other tables receive RLS automatically based on their column structure.
Configuration
Environment Variables
Authentication Matrix
Summary
This PR delivers a complete security and performance overhaul:
Impact:
Next Steps: