|
| 1 | +# TART Backend Test Suite |
| 2 | + |
| 3 | +This document explains the testing architecture and best practices for the TART telemetry backend. |
| 4 | + |
| 5 | +## Test Organization |
| 6 | + |
| 7 | +### Unit Tests (No Database Required) |
| 8 | +These tests run in parallel and don't require external services: |
| 9 | +- `types_tests.rs` - Type encoding/decoding (12 tests) |
| 10 | +- `events_tests.rs` - Event serialization (18 tests) |
| 11 | +- `error_tests.rs` - Error handling and edge cases (15 tests) |
| 12 | +- `encoding_tests.rs` - Binary protocol encoding (16 tests) |
| 13 | +- Library tests in `src/` - Core logic (10 tests) |
| 14 | + |
| 15 | +**Total: 71 unit tests** |
| 16 | + |
| 17 | +### Integration Tests (Require PostgreSQL) |
| 18 | +These tests use a real PostgreSQL database and MUST run serially: |
| 19 | +- `api_tests.rs` - REST API endpoints (10 tests) |
| 20 | +- `integration_tests.rs` - End-to-end telemetry flow (8 tests) |
| 21 | +- `optimized_server_tests.rs` - Performance and concurrency (6 tests) |
| 22 | + |
| 23 | +**Total: 24 integration tests** |
| 24 | + |
| 25 | +## Running Tests Locally |
| 26 | + |
| 27 | +```bash |
| 28 | +# Unit tests only (fast, no setup needed) |
| 29 | +cargo test --lib --test types_tests --test events_tests --test error_tests --test encoding_tests |
| 30 | + |
| 31 | +# Integration tests (requires PostgreSQL) |
| 32 | +export TEST_DATABASE_URL="postgres://tart:tart_password@localhost:5432/tart_test" |
| 33 | + |
| 34 | +# Start PostgreSQL (using docker-compose) |
| 35 | +docker-compose up -d postgres |
| 36 | + |
| 37 | +# Create test database and run migrations |
| 38 | +cargo sqlx database create |
| 39 | +cargo sqlx migrate run |
| 40 | + |
| 41 | +# Run integration tests SERIALLY |
| 42 | +cargo test --test api_tests --test integration_tests --test optimized_server_tests -- --test-threads=1 |
| 43 | +``` |
| 44 | + |
| 45 | +## Why Tests Must Run Serially (`--test-threads=1`) |
| 46 | + |
| 47 | +**Problem**: Integration tests share the same PostgreSQL database `tart_test`. |
| 48 | + |
| 49 | +**Without serial execution:** |
| 50 | +- Test A connects 2 nodes → expects 2 in database |
| 51 | +- Test B connects 2 nodes → expects 2 in database |
| 52 | +- Tests run in parallel → both see 4 nodes → BOTH FAIL |
| 53 | + |
| 54 | +**Solution**: Run with `--test-threads=1` to execute one test at a time. |
| 55 | + |
| 56 | +Each test: |
| 57 | +1. Cleans the database (TRUNCATE all tables) |
| 58 | +2. Runs its scenario |
| 59 | +3. Next test cleans and runs |
| 60 | + |
| 61 | +## The Flush Pattern - Why It's Necessary |
| 62 | + |
| 63 | +### The Problem: Asynchronous Background Writer |
| 64 | + |
| 65 | +TART uses a `BatchWriter` that runs in a background task for performance: |
| 66 | + |
| 67 | +``` |
| 68 | +Test sends data → Queues in channel → Background task → Batches → PostgreSQL |
| 69 | + ↓ |
| 70 | +Test continues immediately! |
| 71 | + ↓ |
| 72 | +Test queries database... but data might not be written yet! |
| 73 | +``` |
| 74 | + |
| 75 | +Even though: |
| 76 | +- Node connections flush immediately (line 151, 214 in batch_writer.rs) |
| 77 | +- Events batch every 20ms or 1000 events |
| 78 | + |
| 79 | +The `node_connected()` method returns as soon as it QUEUES the message, not when it's written. |
| 80 | + |
| 81 | +### The Solution: Explicit Flush with Synchronization |
| 82 | + |
| 83 | +```rust |
| 84 | +// Test helper that WAITS for flush to complete |
| 85 | +async fn flush_and_wait(telemetry_server: &Arc<TelemetryServer>) { |
| 86 | + telemetry_server.flush_writes().await.expect("Flush failed"); |
| 87 | + sleep(Duration::from_millis(50)).await; // PostgreSQL commit margin |
| 88 | +} |
| 89 | + |
| 90 | +// In tests: |
| 91 | +connect_test_node(port, 1).await; |
| 92 | +flush_and_wait(&server).await; // ← BLOCKS until database write completes |
| 93 | +let response = get("/api/nodes").await; // Now data is guaranteed to be there |
| 94 | +``` |
| 95 | + |
| 96 | +### Why Not Just Sleep Longer? |
| 97 | + |
| 98 | +❌ **Bad approach:** |
| 99 | +```rust |
| 100 | +connect_test_node(port, 1).await; |
| 101 | +sleep(Duration::from_millis(5000)).await; // Hope this is enough? |
| 102 | +let response = get("/api/nodes").await; |
| 103 | +``` |
| 104 | + |
| 105 | +Problems: |
| 106 | +- Non-deterministic: Might work locally, fail in CI |
| 107 | +- Slow: Wastes time waiting |
| 108 | +- Brittle: Breaks if server is under load |
| 109 | + |
| 110 | +✅ **Good approach (current):** |
| 111 | +```rust |
| 112 | +connect_test_node(port, 1).await; |
| 113 | +flush_and_wait(&server).await; // Deterministic, fast, reliable |
| 114 | +let response = get("/api/nodes").await; |
| 115 | +``` |
| 116 | + |
| 117 | +## Test Isolation Pattern |
| 118 | + |
| 119 | +### Database Cleanup (`#[cfg(test)]` protected) |
| 120 | + |
| 121 | +```rust |
| 122 | +#[cfg(test)] |
| 123 | +impl EventStore { |
| 124 | + pub async fn cleanup_test_data(&self) -> Result<(), sqlx::Error> { |
| 125 | + sqlx::query("TRUNCATE TABLE events, nodes, ...").execute(&self.pool).await?; |
| 126 | + } |
| 127 | +} |
| 128 | +``` |
| 129 | + |
| 130 | +**Safety features:** |
| 131 | +- Only compiled in test builds (not available in production) |
| 132 | +- Used in `setup_test_api()` before each test |
| 133 | +- Ensures clean state for every test |
| 134 | + |
| 135 | +### Common Test Fixtures |
| 136 | + |
| 137 | +Located in `tests/common/mod.rs`: |
| 138 | +- `test_protocol_params()` - Creates valid ProtocolParameters |
| 139 | +- `test_node_info(peer_id)` - Creates valid NodeInformation |
| 140 | +- Reduces duplication across test files |
| 141 | + |
| 142 | +## Best Practices We Follow |
| 143 | + |
| 144 | +✅ **Test Isolation**: Each test starts with clean database |
| 145 | +✅ **Deterministic**: flush() instead of arbitrary sleeps |
| 146 | +✅ **Safety**: Dangerous methods protected with #[cfg(test)] |
| 147 | +✅ **Clear Intent**: Well-documented test helpers |
| 148 | +✅ **Fast Unit Tests**: No database for 71 tests |
| 149 | +✅ **Realistic Integration Tests**: Real PostgreSQL for 24 tests |
| 150 | +✅ **CI Optimized**: Parallel unit tests, serial integration tests |
| 151 | + |
| 152 | +## Alternative Approaches Considered |
| 153 | + |
| 154 | +### 1. Separate Database Per Test |
| 155 | +```rust |
| 156 | +let db_name = format!("tart_test_{}", uuid::new_v4()); |
| 157 | +// Create database, run test, drop database |
| 158 | +``` |
| 159 | +- ✅ Perfect isolation |
| 160 | +- ❌ Very slow (create/drop overhead) |
| 161 | +- ❌ CI complexity |
| 162 | + |
| 163 | +### 2. In-Memory Mock Database |
| 164 | +```rust |
| 165 | +let store = Arc::new(MockEventStore::new()); |
| 166 | +``` |
| 167 | +- ✅ Fast tests |
| 168 | +- ❌ Doesn't test real PostgreSQL behavior |
| 169 | +- ❌ Can miss query bugs, index issues, etc. |
| 170 | + |
| 171 | +### 3. Transaction Rollback Pattern |
| 172 | +```rust |
| 173 | +BEGIN TRANSACTION; |
| 174 | +// Run test |
| 175 | +ROLLBACK; |
| 176 | +``` |
| 177 | +- ✅ Good isolation |
| 178 | +- ❌ Can't use with async background writers |
| 179 | +- ❌ Doesn't work with multiple connections |
| 180 | + |
| 181 | +### 4. Separate Writer for Tests |
| 182 | +```rust |
| 183 | +#[cfg(test)] |
| 184 | +struct SyncWriter { ... } // No batching |
| 185 | +#[cfg(not(test))] |
| 186 | +struct BatchWriter { ... } // Batching |
| 187 | +``` |
| 188 | +- ✅ Tests are simple |
| 189 | +- ❌ Tests don't match production behavior |
| 190 | +- ❌ Large code duplication |
| 191 | + |
| 192 | +**Our chosen approach (#flush + serial execution) balances all concerns.** |
| 193 | + |
| 194 | +## Common Pitfalls |
| 195 | + |
| 196 | +### ❌ Running Integration Tests in Parallel |
| 197 | +```bash |
| 198 | +cargo test # BAD: Tests conflict in shared database |
| 199 | +``` |
| 200 | + |
| 201 | +### ✅ Correct Way |
| 202 | +```bash |
| 203 | +cargo test --test api_tests -- --test-threads=1 |
| 204 | +``` |
| 205 | + |
| 206 | +### ❌ Forgetting to Flush |
| 207 | +```rust |
| 208 | +connect_test_node(port, 1).await; |
| 209 | +// Immediately query - DATA MIGHT NOT BE THERE YET |
| 210 | +let response = get("/api/nodes").await; |
| 211 | +``` |
| 212 | + |
| 213 | +### ✅ Correct Pattern |
| 214 | +```rust |
| 215 | +connect_test_node(port, 1).await; |
| 216 | +flush_and_wait(&server).await; // Ensure data is written |
| 217 | +let response = get("/api/nodes").await; |
| 218 | +``` |
| 219 | + |
| 220 | +## CI Configuration |
| 221 | + |
| 222 | +GitHub Actions workflow (`.github/workflows/ci.yml`): |
| 223 | + |
| 224 | +```yaml |
| 225 | +# Unit tests in parallel (fast) |
| 226 | +cargo test --lib --test types_tests --test events_tests --test error_tests --test encoding_tests |
| 227 | + |
| 228 | +# Integration tests serially (safe) |
| 229 | +cargo test --test api_tests --test integration_tests --test optimized_server_tests -- --test-threads=1 |
| 230 | +``` |
| 231 | + |
| 232 | +This ensures: |
| 233 | +- Fast feedback for unit tests |
| 234 | +- Reliable integration tests |
| 235 | +- No database conflicts |
| 236 | + |
| 237 | +## Future Improvements |
| 238 | + |
| 239 | +Potential enhancements for the test suite: |
| 240 | + |
| 241 | +1. **Test fixtures with realistic data** - Pre-populate database with sample nodes/events |
| 242 | +2. **Property-based testing** - Use proptest for fuzz testing encoders |
| 243 | +3. **Load testing** - Verify 1024 concurrent connections |
| 244 | +4. **Chaos testing** - Simulate network failures, database outages |
| 245 | +5. **Benchmark suite** - Track performance regressions |
| 246 | + |
| 247 | +## Summary |
| 248 | + |
| 249 | +Our testing approach follows industry best practices: |
| 250 | +- Separate unit and integration tests |
| 251 | +- Explicit synchronization instead of sleeps |
| 252 | +- Test isolation through database cleanup |
| 253 | +- Safety through compile-time checks (#[cfg(test)]) |
| 254 | +- Clear documentation of patterns |
| 255 | + |
| 256 | +The flush pattern is used by many production systems (Kafka, async loggers, batch processors) and is the correct solution for testing asynchronous background workers. |
0 commit comments