Skip to content

Conversation

@standujar
Copy link
Collaborator

@standujar standujar commented Nov 2, 2025

Summary

This PR implements four major improvements to ElizaOS's security and data architecture:

  1. Entity-Level Row Level Security (RLS) - PostgreSQL RLS policies for entity-based data isolation
  2. Semantic Clarity Refactoring - Renames serverId to messageServerId for clarity
  3. JWT Authentication Foundation - Complete JWT implementation with multi-provider support
  4. Performance Optimization - Efficient participant checking methods (isRoomParticipant, isChannelParticipant)

All changes maintain full backward compatibility with existing deployments and plugins.


Table of Contents

  1. Architecture Overview
  2. Test Coverage

Architecture Overview

1. Entity-Level Row Level Security (RLS)

Problem: ElizaOS needed fine-grained access control to isolate data by entity (users, agents, bots, etc.) within the same database.

Solution: Implemented PostgreSQL RLS policies that automatically filter data based on the current entity context

Benefits:

  • Data Isolation: Each entity only sees its own data
  • Automatic Enforcement: RLS is enforced at the database level, preventing accidental data leaks
  • Performance: Database-level filtering is more efficient than application-level checks
  • Security: Even if application code has bugs, RLS prevents unauthorized access

Implementation:

  • Added current_entity_id() PostgreSQL function to track current entity context via app.entity_id session variable
  • Created add_entity_isolation() function to apply RLS policies to tables
  • Two isolation strategies:
    • Direct ownership: Tables with entityId or authorId columns
    • Shared access: Tables with roomId that join to participants table
  • Integrated with entity context management in ElizaOS core

2. Server-Level Row Level Security (RLS)

Problem: ElizaOS needed multi-tenant isolation to prevent data leakage between different server instances (deployments, environments).

Solution: Implemented PostgreSQL RLS policies that automatically isolate data by ElizaOS server instance.

Already implemented: #6101


3. Semantic Clarity: serverId vs messageServerId

Problem Statement

Why serverId was problematic

The term serverId was ambiguous and created confusion in the codebase for multiple reasons:

1. Semantic Ambiguity

The name serverId doesn't clearly indicate what type of server it refers to. In a distributed system like ElizaOS, "server" could mean:

  • Message servers (Discord, Telegram, Slack)
  • Application servers (ElizaOS instances)
  • Database servers
  • Authentication servers

This ambiguity made code harder to read and maintain.

2. Conflict with Row Level Security (RLS)

ElizaOS uses PostgreSQL Row Level Security for multi-tenant isolation. In this context:

  • server_id in RLS refers to the ElizaOS server instance (for tenant isolation)
  • serverId in messaging refers to external message platforms (Discord guild, Telegram bot, etc.)

Key distinction:

  • ONE ElizaOS server instance (server_id = "abc-123") can connect to MULTIPLE message servers
    • Discord guilds (messageServerId = "discord-1", messageServerId = "discord-2")
    • Telegram bots (messageServerId = "telegram-1")

This dual meaning created confusion:

// Which serverId is this? ElizaOS instance or Discord guild?
const room = await adapter.getRoom({ serverId, roomId });

// Is this filtering by tenant or by Discord server?
await adapter.getRoomsByServerId(serverId);

3. Developer Confusion

When working on features involving both RLS and messaging:

  • Setting RLS policies with server_id (tenant isolation)
  • Querying rooms by serverId (message platform)

Same name, completely different concepts → bugs and confusion

4. API Inconsistency

API routes like /api/agents/:agentId/servers/:serverId/channels didn't clearly communicate that serverId refers to a messaging platform, not an ElizaOS server.

Solution

Rename message-related serverId to messageServerId to:

  • Clearly indicate purpose: It's the ID of an external messaging platform
  • Avoid RLS conflicts: RLS continues using server_id for tenant isolation
  • Improve maintainability: Code is self-documenting and semantically clear
  • Better API design: Routes like /api/agents/:agentId/message-servers/:messageServerId/channels are crystal clear

4. JWT Authentication - Complete Implementation

Problem: ElizaOS needed secure authentication for WebSocket (SocketIO) and REST API connections, with proper entity isolation.

Solution: Implemented comprehensive JWT authentication infrastructure with multi-provider support.

Key Features Implemented

Universal JWT Verifier (middleware/jwt-verifier.ts)

  • Supports JWKS verification (Privy, CDP, Auth0, any OAuth provider)
  • Supports shared secret verification (custom self-hosted auth)
  • Deterministic entityId generation: stringToUuid(jwt.sub)
  • Issuer whitelist validation
  • JWKS caching (1 hour) for performance
  • Graceful error handling

Authentication Endpoints (api/auth/credentials.ts)

  • POST /api/auth/register - User registration with email/password
  • POST /api/auth/login - User authentication
  • POST /api/auth/refresh - Token refresh
  • GET /api/auth/me - Current user info
  • Password hashing with bcrypt (cost factor 10)
  • Email uniqueness validation
  • Input validation and sanitization

Database Integration

  • users table schema with proper indexes
  • User management methods in DatabaseAdapter:
    • getUserByEmail(email: string): Promise<User | null>
    • getUserByUsername(username: string): Promise<User | null>
    • getUserById(id: UUID): Promise<User | null>
    • createUser(user: User): Promise<User>
    • updateUserLastLogin(userId: UUID): Promise<void>
  • Implementations in BaseDrizzleAdapter, DatabaseAdapter, and AgentRuntime

Middleware Chain

  • Layer 1: API Key middleware (middleware/api-key.ts)

    • Authenticates frontend → server connection
    • Validates X-API-KEY header
    • Active if ELIZA_SERVER_AUTH_TOKEN configured
    • Allows CORS preflight (OPTIONS)
  • Layer 2: JWT middleware (middleware/jwt.ts)

    • Authenticates user identity
    • Extracts entityId from JWT
    • Sets req.userId for downstream handlers
    • Active if ENABLE_DATA_ISOLATION=true
    • Supports both JWKS and shared secret modes

WebSocket Authentication

  • Production-grade JWT verification for SocketIO
  • Handshake authentication with JWT token
  • Entity context extraction from verified tokens
  • Backward compatible dev mode (accepts entityId directly when ENABLE_DATA_ISOLATION=false)
  • Proper error handling and logging

Multi-Provider Architecture

Supports two authentication modes simultaneously:

  1. External Provider Mode (JWKS)

    JWT_JWKS_URI=https://auth.privy.io/.well-known/jwks.json
    JWT_ISSUER_WHITELIST=privy.io,cdp.coinbase.com
    • No users table required
    • Provider handles user management
    • Public key verification
    • Change provider = change env var only
  2. Custom Auth Mode (Shared Secret)

    JWT_SECRET=your-256-bit-secret
    POSTGRES_URL=postgresql://...
    • Full control over user management
    • Custom registration/login flows
    • Works without external dependencies
    • Stable entityId generation

Entity Creation Strategy

  • Lazy entity creation via ensureConnection()
  • No duplicate entities (fixed critical bug in message.ts)
  • Web users have single entity across SocketIO and HTTP
  • Consistent entity identity across transport mechanisms

Two-Layer Security Model

Layer 1: Authentication (JWT Validation)

  • Question: "Who are you?"
  • Implementation: JWT token validation
  • State: Production ready (JWKS + shared secret)
  • Features: Signature validation, expiration checks, issuer validation

Layer 2: Authorization (Participant Checking)

  • Question: "Can you access THIS specific channel?"
  • Implementation: isChannelParticipant() database check
  • State: Fully implemented
  • Security: Prevents cross-channel data leakage

5. Performance Optimization: Participant Checking

Problem: Checking if an entity is a participant required loading ALL participants into memory and using .some() - O(n) complexity.

Solution: Added direct database existence checks - O(1) complexity.

New Methods:

  • isRoomParticipant(entityId, roomId) - Direct DB query
  • isChannelParticipant(entityId, channelId) - Direct DB query

Benefits:

  • Constant time complexity - O(1) instead of O(n)
  • Lower memory usage - No loading all participants
  • Better scalability - Handles rooms with 1000+ participants
  • Database indexes - Optimized queries

6. RLS Security for Junction Table

Problem Identified: Without RLS on message_server_agents, Server A could see the existence of Discord/Telegram servers linked to Server B's agents.

Solution: The RLS system automatically adds isolation to the junction table:

  • Adds server_id UUID DEFAULT current_server_id() column
  • Creates server_isolation_policy for complete isolation
  • Server A cannot see or modify Server B's message server associations

Test Coverage

All Tests Passing ✅

RLS Tests: 77 tests pass, 0 fail, 173 expect() calls

Participant Tests: 11 tests pass, 0 fail, 27 expect() calls

  • 5 tests in participant.test.ts (3 new for isRoomParticipant)
  • 6 tests in messaging.test.ts (2 new for isChannelParticipant)

JWT Tests: 15+ tests pass

  • Unit tests for JWT middleware
  • Token generation and verification
  • Expiration handling
  • Multi-provider authentication
  • Integration tests for auth flow

Test Files

Unit Tests - Entity RLS (entity-rls.test.ts)

  • Column detection priority (roomId > entityId > authorId)
  • Policy generation (STRICT vs PERMISSIVE modes)
  • Isolation behavior logic

Integration Tests - Entity RLS (rls-entity.test.ts)

  • Entity isolation (Alice, Bob, Charlie)
  • Participant-based access control (room membership)
  • Combined Server RLS + Entity RLS (double isolation)

Integration Tests - message_server_agents (rls-message-server-agents.test.ts)

  • Isolation: Server A sees only its 2 associations, Server B sees only its 1
  • Auto-population: server_id automatically set via DEFAULT current_server_id()
  • Query blocking: Server A queries Server B's message server → 0 results
  • Modification blocking: Server B tries to delete Server A's association → blocked
  • JOIN protection: Cross-server JOINs filtered correctly
  • Schema validation: Policy and DEFAULT constraint verified

JWT Authentication Tests (jwt-middleware.test.ts, auth-flow.test.ts)

  • JWT token generation and verification
  • Token expiration handling
  • Invalid token rejection
  • Dual authentication (JWT or API Key)
  • Complete auth flow (register → login → API access)
  • User management operations
  • WebSocket authentication

Room Integration Tests (5/5 passing)

  • Added test: should map messageServerId to serverId for backward compatibility
  • Verifies both fields are populated correctly

Breaking Changes

None. All changes are fully backward compatible.


Benefits

Code Clarity

  • Developers immediately understand what messageServerId refers to
  • Self-documenting code: No additional comments needed
  • Clear method names: isChannelParticipant() vs generic checks

Security

  • Three-layer security: JWT Authentication + Authorization (participant checking) + RLS
  • Complete RLS isolation for both Server-level and Entity-level data
  • Fail-closed security model (deny access on errors)
  • Database-enforced isolation (can't be bypassed by application bugs)
  • Zero Configuration: RLS policies apply automatically to all tables
  • Production-grade JWT with signature verification and expiration

Developer Experience

  • Reduced Bugs: No more confusion between RLS server_id and messaging serverId
  • Better Onboarding: New developers don't need to guess which "server" is referenced
  • Future-Proof: Clear naming prevents similar ambiguities in future development
  • Backward Compatible: Existing code continues to work
  • Type Safety: TypeScript guides migration with deprecation warnings
  • Multi-Provider Flexibility: Switch auth providers without code changes

Authentication & Authorization

  • Universal JWT Support: Works with any JWT provider (Privy, CDP, Auth0, custom)
  • Zero Lock-in: Change provider = change env var only
  • Dual Authentication: JWT or API Key (backward compatible)
  • Deterministic Entity IDs: Stable user identity across sessions
  • Secure by Default: Production-grade password hashing (bcrypt)
  • WebSocket Security: Full JWT support for real-time connections

Testing

  • Comprehensive Testing: 103+ total tests ensure complete coverage
  • All tests passing: 0 failures, 200+ assertions
  • Integration tests: Real database scenarios
  • Performance tests: Verify optimization improvements
  • Security tests: JWT verification, token expiration, unauthorized access

Database Migration

Automatic Migration System

The migration system automatically handles:

  1. Table rename: server_agentsmessage_server_agents
  2. Column rename: server_idmessage_server_id in junction table
  3. Users table: Creates users table with proper schema and indexes
  4. RLS automatic application:
    • Adds server_id column with DEFAULT current_server_id() to all tables
    • Creates indexes for performance
    • Applies isolation policies (both server-level and entity-level)
    • Handles backfill for existing data

Developer experience:

  • Zero configuration required - just update and restart
  • No manual SQL scripts - everything is automated
  • Idempotent and safe - can be run multiple times without issues

RLS Architecture Details

Three-Layer Security Model

Layer 1: Server RLS (Multi-Tenant Isolation)

  • Isolates data between different ElizaOS server instances
  • Uses server_id for isolation
  • Context set via app.server_id transaction-local variable

Layer 2: Entity RLS (User Privacy Isolation)

  • Isolates data between different entities within a server
  • Uses entityId, authorId, or joins via participants table
  • Context set via app.entity_id transaction-local variable
  • Provides DM privacy and multi-user isolation

Layer 3: Application Layer (JWT)

  • JWT authentication verifies user identity

All three layers stack - a user can only see data from their server AND their accessible entities AND that they have permission to access.

Excluded Tables (with rationale)

Entity RLS Exclusions:

  • users - Authentication table (no entity isolation needed)
  • entity_mappings - Cross-platform entity mapping
  • drizzle_migrations, __drizzle_migrations - Migration tracking
  • agents - Shared across entities
  • owners - RLS management table

All other tables receive RLS automatically based on their column structure.


Configuration

Environment Variables

# Layer 1: Frontend→Server authentication (API Key)
ELIZA_SERVER_AUTH_TOKEN=your-secret-key    # Optional: Secure frontend→server connection

# Layer 2: User authentication (JWT for data isolation)
ENABLE_DATA_ISOLATION=true                  # Enable multi-user data isolation + JWT auth

# JWT Configuration - Choose ONE mode:

# Mode 1: External Provider (JWKS)
JWT_JWKS_URI=https://auth.privy.io/.well-known/jwks.json  # JWKS URI
JWT_ISSUER_WHITELIST=privy.io,cdp.coinbase.com            # Allowed issuers

# Mode 2: Custom Auth (Shared Secret)
JWT_SECRET=your-256-bit-secret             # JWT signing secret (generate with crypto.randomBytes(32))

# RLS Configuration
ENABLE_RLS_ISOLATION=true                  # Enable Row Level Security
RLS_OWNER_ID=my-server-uuid               # Server instance ID for multi-tenant isolation

Authentication Matrix

Mode JWT_JWKS_URI JWT_SECRET Users Table Use Case
Privy ✅ Set ❌ Not needed Web3 auth, wallet login
CDP ✅ Set ❌ Not needed Coinbase wallet auth
Custom ✅ Set ✅ Required Email/password, self-hosted
Hybrid ✅ Set ✅ Set ✅ Required Multiple auth methods

Summary

This PR delivers a complete security and performance overhaul:

  • Entity-Level RLS: Automatic data isolation at the database level
  • Semantic Clarity: Clear naming eliminates confusion
  • JWT Authentication: Complete multi-provider implementation with universal verification
  • WebSocket Security: Production-grade JWT authentication for real-time connections
  • Performance Optimization: O(1) participant checking
  • 100% Backward Compatible: Zero breaking changes
  • Fully Tested: 103+ tests, 0 failures
  • Production Ready: Auto-migration, fail-closed security, multi-provider support

Impact:

  • 🔒 Stronger security with three-layer isolation (Server RLS + Entity RLS + JWT)
  • 📖 Clearer code with semantic naming
  • 🚀 Better performance with optimized queries
  • 🔐 Production-grade authentication with multi-provider JWT support
  • 🌐 Universal compatibility works with any JWT provider
  • 🔧 Developer-friendly zero-config automatic migrations

Next Steps:

  • Phase 3: ACL Service - Other PR
  • Phase 4: Use ElizaOS cloud as provider

@standujar standujar marked this pull request as draft November 2, 2025 16:12
@graphite-app
Copy link

graphite-app bot commented Nov 2, 2025

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • merge-queue - adds this PR to the back of the merge queue
  • merge-queue-hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 2, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/entity-rls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude

This comment was marked as outdated.

@standujar standujar requested a review from wtfsayo November 19, 2025 10:46
@standujar standujar changed the title feat: implement entity-level row level security feat: Authentication & Entity-level RLS & Security Improvements Nov 19, 2025
@standujar standujar changed the title feat: Authentication & Entity-level RLS & Security Improvements feat: Authentication, Entity-level RLS & Security Improvements Nov 19, 2025
@standujar
Copy link
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@standujar
Copy link
Collaborator Author

@claude full review

@claude
Copy link
Contributor

claude bot commented Nov 20, 2025

Claude finished @standujar's task —— View job


Full PR Review for Authentication, Entity-level RLS & Security Improvements

After a comprehensive review of this major security enhancement PR, here are my findings and recommendations:

Review Checklist

  • Architecture and design review
  • Security implementation review (JWT, RLS, auth middleware)
  • Database schema and migration review
  • Test coverage and quality assessment
  • Performance and compatibility analysis
  • Code quality and best practices review

🟢 Overall Assessment: Excellent Implementation

This PR demonstrates exceptional engineering quality with comprehensive security improvements, thorough testing, and excellent documentation. The implementation follows security best practices and maintains backward compatibility.


🔒 Security Analysis

Strengths

  1. Multi-Layer Security Architecture: Excellent three-layer security model:

    • Layer 1: API Key authentication (frontend ↔ server)
    • Layer 2: JWT authentication (user identity)
    • Layer 3: Database-level RLS isolation
  2. Robust JWT Implementation:

    • Universal verifier supporting multiple providers (JWKS, Ed25519, shared secret)
    • Proper signature verification using jose library
    • Deterministic entity ID generation via stringToUuid()
    • Issuer whitelist validation
    • Secure password hashing with bcrypt (cost factor 10)
  3. Database-Level RLS:

    • Automatic policy enforcement at PostgreSQL level
    • Both server-level and entity-level isolation
    • Fail-closed security model (deny by default)
  4. Performance Optimizations:

    • O(1) participant checking with isRoomParticipant() and isChannelParticipant()
    • Efficient database existence checks vs O(n) memory operations

Security Issues Identified

🚨 Critical: Missing Rate Limiting

Location: packages/server/src/index.ts:785

The auth endpoints lack rate limiting, making them vulnerable to brute force attacks:

// Missing rate limiting middleware before JWT auth
this.app.use('/api', jwtAuthMiddleware);

Recommendation: Add rate limiting middleware before JWT authentication:

import rateLimit from 'express-rate-limit';

const authRateLimit = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // limit each IP to 100 requests per windowMs
  message: 'Too many authentication attempts, please try again later.'
});

this.app.use('/api/auth', authRateLimit);
this.app.use('/api', jwtAuthMiddleware);

⚠️ Medium: Localhost Bypass in Production

Location: packages/server/src/middleware/jwt-auth.ts:56-65

The localhost bypass could be problematic in containerized environments:

const isLocalhost = req.ip === '127.0.0.1' ||
                   req.ip === '::1' ||
                   req.ip === '::ffff:127.0.0.1' ||
                   req.hostname === 'localhost';

Recommendation: Add explicit environment check:

const isDevelopment = process.env.NODE_ENV === 'development';
if (isLocalhost && isDevelopment) {
  // Allow bypass only in development
}

⚠️ Medium: Broad Error Information

Location: packages/server/src/middleware/jwt-auth.ts:94-98

JWT errors expose detailed failure reasons:

return res.status(401).json({
  error: 'Invalid JWT token',
  details: error.message, // ⚠️ Potentially sensitive
});

Recommendation: Limit error details in production:

const isDevelopment = process.env.NODE_ENV === 'development';
return res.status(401).json({
  error: 'Invalid JWT token',
  ...(isDevelopment && { details: error.message })
});

🏗️ Architecture Analysis

Excellent Design Decisions

  1. Semantic Clarity Refactoring: The serverIdmessageServerId rename brilliantly resolves the ambiguity between:

    • ElizaOS server instances (for RLS tenant isolation)
    • External message platforms (Discord, Telegram, etc.)
  2. Universal JWT Verifier: Elegant factory pattern supporting multiple providers without vendor lock-in

  3. Automatic Migration System: Zero-configuration database migrations with idempotent operations

  4. Comprehensive Test Coverage: 103+ tests with 0 failures across all components

Code Quality

  • Excellent TypeScript usage with proper type safety
  • Comprehensive error handling throughout
  • Clear documentation with detailed examples
  • Consistent naming conventions and code style
  • Proper separation of concerns between layers

🔍 Database & Migration Review

Strengths

  1. Robust RLS Implementation:

    • add_entity_isolation() function handles both direct ownership and participant-based access
    • Automatic policy application to all relevant tables
    • Proper handling of junction tables (message_server_agents)
  2. Migration Safety:

    • Idempotent operations (safe to run multiple times)
    • Backward compatibility maintained
    • Comprehensive test coverage for migration scenarios
  3. Performance Considerations:

    • Proper indexes on RLS columns
    • Efficient participant checking methods
    • Database-level filtering vs application-level

Minor Concerns

  1. RLS Exclusion Logic: The excluded tables list in RLS setup could benefit from more detailed documentation about why each table is excluded.

  2. Migration Rollback: While migrations are comprehensive, rollback procedures aren't explicitly documented.


🧪 Test Coverage Analysis

Outstanding Test Suite

  • 77 RLS tests with comprehensive entity isolation scenarios
  • 15+ JWT authentication tests covering all verifier types
  • 11 participant checking tests validating performance optimizations
  • Integration tests for real database scenarios
  • Migration tests for production scenarios

Test Quality

  • Real database testing (not just mocks)
  • Edge case coverage (expired tokens, invalid JWTs, cross-tenant access)
  • Performance validation (O(1) vs O(n) operations)
  • Security boundary testing (RLS isolation verification)

🚀 Performance & Compatibility

Performance Improvements

Participant Checking: O(1) database queries vs O(n) memory operations
RLS Filtering: Database-level vs application-level filtering
JWT Caching: JWKS endpoint caching (1 hour)

Backward Compatibility

Zero Breaking Changes: All existing APIs continue to work
Graceful Degradation: Features work without JWT when ENABLE_DATA_ISOLATION=false
Migration Safety: Automatic schema updates with data preservation


📝 Recommendations

Must Fix

  1. Add rate limiting to authentication endpoints (Critical)
  2. Restrict localhost bypass to development only (Medium)
  3. Limit error details in production JWT responses (Medium)

Consider for Future

  1. Session Management: Add JWT refresh token rotation
  2. Audit Logging: Log authentication events for security monitoring
  3. Role-Based Access: Extend beyond entity-level to role-based permissions
  4. API Versioning: Consider versioning for future auth changes

🎯 Final Verdict

This is an exemplary security implementation that significantly enhances ElizaOS's enterprise readiness. The multi-layered approach, comprehensive testing, and attention to backward compatibility make this a production-ready enhancement.

Security Score: 8.5/10 (excellent with minor fixes needed)
Code Quality: 9/10 (outstanding)
Test Coverage: 9/10 (comprehensive)
Documentation: 9/10 (excellent)

Recommendation: Approve with minor security fixes for rate limiting and localhost bypass restrictions.


The implementation demonstrates deep understanding of security architecture and represents a significant improvement to ElizaOS's security posture. Excellent work! 🎉


Copy link
Collaborator

@0xbbjoker 0xbbjoker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #6107 Analysis - Breaking Changes & Discussion Points

Link: #6107

This PR is massive and should be split into smaller, reviewable chunks. Key concerns:

1. User Table - Do we need it?

  • Adds users table with email/password auth (bcrypt hashing)
  • Conflict risk: Will clash with cloud schema design
  • Question: Is this core functionality or should auth be pluggable/external (Privy, Auth0, CDP)?

2. JWT Implementation - Why JWT?

  • Adds full JWT auth stack (jose library, refresh tokens, 7-day expiry)
  • Routes: /api/auth/register, /api/auth/login, /api/auth/refresh, /api/auth/me
  • Only enabled when ENABLE_DATA_ISOLATION=true
  • Question: Should this be core or let end-users implement their own auth provider? Makes ElizaOS more opinionated.

3. RLS Implementation - This we need

  • Two-layer security: Server RLS (multi-tenant) + Entity RLS (user isolation)
  • Server RLS: Isolates different ElizaOS instances via server_id column
  • Entity RLS: Isolates user data via entity_id (DM privacy, multi-user)
  • Cloud-ready: This is valuable for cloud deployments
  • Recommendation: Extract RLS-only changes into separate PR

4. Breaking API Changes

  • New domain-based routing: /api/messaging, /api/agents, /api/memory, /api/audio, /api/auth
  • Removed: /world routes (moved to /messaging/spaces)
  • Sessions API: New unified messaging interface (/api/messaging/sessions, /api/messaging/jobs)
  • Impact: All existing API consumers will break
  • Need: Migration guide + deprecation period

Proposed Breakdown

  1. PR 1: RLS infrastructure only (server + entity isolation) - ship this for cloud
  2. PR 2: API restructuring (domain routing, sessions API) - with migration docs
  3. PR 3: Auth system (JWT + user table) - needs architecture discussion first

Critical Questions

  • Do we want opinionated auth or remain auth-agnostic?
  • Can we align user table schema with planned cloud schema?

cc @wtfsayo @ChristopherTrimboli @odilitime

@standujar
Copy link
Collaborator Author

PR #6107 Analysis - Breaking Changes & Discussion Points

Link: #6107

This PR is massive and should be split into smaller, reviewable chunks. Key concerns:

1. User Table - Do we need it?

  • Adds users table with email/password auth (bcrypt hashing)
  • Conflict risk: Will clash with cloud schema design
  • Question: Is this core functionality or should auth be pluggable/external (Privy, Auth0, CDP)?

2. JWT Implementation - Why JWT?

  • Adds full JWT auth stack (jose library, refresh tokens, 7-day expiry)
  • Routes: /api/auth/register, /api/auth/login, /api/auth/refresh, /api/auth/me
  • Only enabled when ENABLE_DATA_ISOLATION=true
  • Question: Should this be core or let end-users implement their own auth provider? Makes ElizaOS more opinionated.

3. RLS Implementation - This we need

  • Two-layer security: Server RLS (multi-tenant) + Entity RLS (user isolation)
  • Server RLS: Isolates different ElizaOS instances via server_id column
  • Entity RLS: Isolates user data via entity_id (DM privacy, multi-user)
  • Cloud-ready: This is valuable for cloud deployments
  • Recommendation: Extract RLS-only changes into separate PR

4. Breaking API Changes

  • New domain-based routing: /api/messaging, /api/agents, /api/memory, /api/audio, /api/auth
  • Removed: /world routes (moved to /messaging/spaces)
  • Sessions API: New unified messaging interface (/api/messaging/sessions, /api/messaging/jobs)
  • Impact: All existing API consumers will break
  • Need: Migration guide + deprecation period

Proposed Breakdown

  1. PR 1: RLS infrastructure only (server + entity isolation) - ship this for cloud
  2. PR 2: API restructuring (domain routing, sessions API) - with migration docs
  3. PR 3: Auth system (JWT + user table) - needs architecture discussion first

Critical Questions

  • Do we want opinionated auth or remain auth-agnostic?
  • Can we align user table schema with planned cloud schema?

cc @wtfsayo @ChristopherTrimboli @odilitime

Thanks a lot for the detailed review !! I fully agree that this needs to be split.

About the JWT/User part:
You’re right to question whether this should be core or not. From the frontend, I currently had no way to pass or validate the entity correctly without adding some form of auth layer, which is why I implemented this version. The implementation was intentionally generic so it could work with any provider (Auth0, Privy, custom, etc.).
That said, we can absolutely remove or rethink this part if we want to stay auth-agnostic. Do you have a preferred direction here?

About the API changes:
Yes, I also plan to re-add the old endpoints with a proper deprecation period to avoid breaking all consumers at once.

Next steps:
I’ll extract the RLS / entity isolation work into its own PR first so we can ship that independently.

…clarity and consistency in messaging functionality
… conventions, ensuring backward compatibility with messageServerId
…ionality across the codebase for improved clarity and consistency in messaging operations
…server_agents' and adjust related identifiers for consistency in migration tests
…proper data removal and handle potential errors
…luding user registration, login, and retrieval methods
- Implemented `isRoomParticipant` method in the `DatabaseAdapter` interface and its SQL implementation to efficiently check if an entity is a participant in a room.
- Added `isChannelParticipant` method in the `BaseDrizzleAdapter` for channel participant verification.
- Updated relevant tests to validate the new participant check functionalities.
- Enhanced authentication middleware to support JWT and API key checks for secure access to channels and rooms.
… and protected features, including session management and API mocking
- Reorganize tests into unit/ and integration/ folders
- Add SocketIOClientFixture and JWTTestHelper utilities
- Fix JWKS verifier test to use manual fetch mock (bun-fetch-mock incompatible)
- Fix SocketIO auth test for DATA_ISOLATION=false case
- Add PostgreSQL detection to migrations (skip on SQLite)
- Rename bootstrap-autoload.test.ts → plugin-auto-injection.test.ts
- Rename middleware-chain.test.ts → auth-middleware-chain.test.ts
- Remove jobs-message-flow.test.ts (merged into other integration tests)
@standujar standujar changed the title feat: Authentication, Entity-level RLS & Security Improvements feat: Authentication, Entity-level RLS & Security Improvements [DO NOT MERGE] Nov 25, 2025
@standujar standujar changed the title feat: Authentication, Entity-level RLS & Security Improvements [DO NOT MERGE] feat: Authentication [DO NOT MERGE] Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants