|
| 1 | +# Test Outputs |
| 2 | + |
| 3 | +This directory contains saved outputs from integration tests for inspection and debugging. |
| 4 | + |
| 5 | +## Purpose |
| 6 | + |
| 7 | +Integration tests make real API calls which are expensive and time-consuming. The `--save-outputs` flag allows you to save these responses for later inspection without re-running tests. |
| 8 | + |
| 9 | +## Usage |
| 10 | + |
| 11 | +### Saving Test Outputs |
| 12 | + |
| 13 | +Run integration tests with the `--save-outputs` flag: |
| 14 | + |
| 15 | +```bash |
| 16 | +# Save outputs from all integration tests |
| 17 | +pytest -m integration --save-outputs |
| 18 | + |
| 19 | +# Save outputs from specific test file |
| 20 | +pytest tests/integration/test_pydantic_validation_e2e.py --save-outputs |
| 21 | + |
| 22 | +# Save outputs from specific test |
| 23 | +pytest tests/integration/test_pydantic_validation_e2e.py::test_deepsearch_pydantic_validation_e2e --save-outputs |
| 24 | +``` |
| 25 | + |
| 26 | +### Running Without Saving |
| 27 | + |
| 28 | +By default, tests run normally without saving outputs: |
| 29 | + |
| 30 | +```bash |
| 31 | +# No outputs saved (default behavior) |
| 32 | +pytest -m integration |
| 33 | +``` |
| 34 | + |
| 35 | +## Directory Structure |
| 36 | + |
| 37 | +Outputs are organized by test file and test name with timestamps: |
| 38 | + |
| 39 | +``` |
| 40 | +test_outputs/ |
| 41 | +└── integration/ |
| 42 | + ├── deepsearch_service_integration/ |
| 43 | + │ ├── test_deepsearch_service_preset_integration_20231209_143052/ |
| 44 | + │ │ ├── raw_response.json |
| 45 | + │ │ └── extracted_json.json |
| 46 | + │ └── test_deepsearch_service_backward_compatibility_20231209_143105/ |
| 47 | + │ └── raw_response.json |
| 48 | + └── pydantic_validation_e2e/ |
| 49 | + ├── test_deepsearch_pydantic_validation_e2e_20231209_143210/ |
| 50 | + │ ├── raw_response.json |
| 51 | + │ ├── extracted_json.json |
| 52 | + │ └── validation_result.json |
| 53 | + └── test_deepsearch_pydantic_validation_with_moderate_genes_20231209_143245/ |
| 54 | + ├── raw_response.json |
| 55 | + ├── extracted_json.json |
| 56 | + └── validation_result.json |
| 57 | +``` |
| 58 | + |
| 59 | +### Output Files |
| 60 | + |
| 61 | +Each test run creates up to 3 JSON files: |
| 62 | + |
| 63 | +#### 1. `raw_response.json` |
| 64 | + |
| 65 | +Contains the full API response with metadata: |
| 66 | + |
| 67 | +```json |
| 68 | +{ |
| 69 | + "test_context": { |
| 70 | + "genes": ["TMEM14E"], |
| 71 | + "context": "cells", |
| 72 | + "preset": "perplexity-sonar-pro" |
| 73 | + }, |
| 74 | + "response": { |
| 75 | + "markdown": "# Research Results\n\n...", |
| 76 | + "citations": [ |
| 77 | + { |
| 78 | + "source_id": "1", |
| 79 | + "notes": "PubMed article..." |
| 80 | + } |
| 81 | + ], |
| 82 | + "provider": "perplexity", |
| 83 | + "model": "sonar-reasoning-pro", |
| 84 | + "duration_seconds": 45.2 |
| 85 | + }, |
| 86 | + "saved_at": "2023-12-09T14:32:10.123456" |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +#### 2. `extracted_json.json` (when available) |
| 91 | + |
| 92 | +Contains the extracted and parsed JSON from the API response: |
| 93 | + |
| 94 | +```json |
| 95 | +{ |
| 96 | + "context": { |
| 97 | + "cell_type": "cells", |
| 98 | + "disease": "", |
| 99 | + "tissue": "" |
| 100 | + }, |
| 101 | + "input_genes": ["TMEM14E"], |
| 102 | + "programs": [ |
| 103 | + { |
| 104 | + "program_name": "Example Program", |
| 105 | + "description": "Program description", |
| 106 | + "atomic_biological_processes": [], |
| 107 | + "atomic_cellular_components": [], |
| 108 | + "predicted_cellular_impact": [] |
| 109 | + } |
| 110 | + ], |
| 111 | + "version": "1.0" |
| 112 | +} |
| 113 | +``` |
| 114 | + |
| 115 | +#### 3. `validation_result.json` (when available) |
| 116 | + |
| 117 | +Contains pydantic validation metadata: |
| 118 | + |
| 119 | +```json |
| 120 | +{ |
| 121 | + "success": true, |
| 122 | + "retry_count": 0, |
| 123 | + "validation_time_ms": 12.5, |
| 124 | + "error": null |
| 125 | +} |
| 126 | +``` |
| 127 | + |
| 128 | +## Use Cases |
| 129 | + |
| 130 | +### 1. Debug Validation Failures |
| 131 | + |
| 132 | +When a test fails validation, inspect the saved outputs: |
| 133 | + |
| 134 | +```bash |
| 135 | +# Run test with output saving |
| 136 | +pytest tests/integration/test_pydantic_validation_e2e.py::test_deepsearch_pydantic_validation_e2e --save-outputs |
| 137 | + |
| 138 | +# Inspect the extracted JSON |
| 139 | +cat test_outputs/integration/pydantic_validation_e2e/test_deepsearch_pydantic_validation_e2e_*/extracted_json.json | jq . |
| 140 | + |
| 141 | +# Check validation errors |
| 142 | +cat test_outputs/integration/pydantic_validation_e2e/test_deepsearch_pydantic_validation_e2e_*/validation_result.json | jq . |
| 143 | +``` |
| 144 | + |
| 145 | +### 2. Compare API Responses |
| 146 | + |
| 147 | +Save outputs from different test runs to compare: |
| 148 | + |
| 149 | +```bash |
| 150 | +# Run with different presets |
| 151 | +pytest tests/integration/test_deepsearch_service_integration.py --save-outputs |
| 152 | + |
| 153 | +# Compare responses |
| 154 | +diff test_outputs/integration/deepsearch_service_integration/test_*_preset_*/raw_response.json |
| 155 | +``` |
| 156 | + |
| 157 | +### 3. Inspect Citations |
| 158 | + |
| 159 | +Review citation quality without re-running expensive API calls: |
| 160 | + |
| 161 | +```bash |
| 162 | +# Extract citations from saved output |
| 163 | +cat test_outputs/integration/*/test_*/raw_response.json | jq '.response.citations' |
| 164 | +``` |
| 165 | + |
| 166 | +### 4. Performance Analysis |
| 167 | + |
| 168 | +Analyze API response times: |
| 169 | + |
| 170 | +```bash |
| 171 | +# Extract duration from all saved outputs |
| 172 | +find test_outputs -name "raw_response.json" -exec jq -r '.response.duration_seconds' {} \; |
| 173 | +``` |
| 174 | + |
| 175 | +## Notes |
| 176 | + |
| 177 | +- Outputs are **NOT committed to git** (excluded via `.gitignore`) |
| 178 | +- Each test run creates a **new timestamped directory** (outputs never overwrite) |
| 179 | +- Directory is **only created when `--save-outputs` is used** |
| 180 | +- Test performance is **not impacted** when flag is not used |
| 181 | +- Outputs contain **real API responses** - do not commit if they contain sensitive data |
| 182 | + |
| 183 | +## Implementation Details |
| 184 | + |
| 185 | +The output saving infrastructure consists of: |
| 186 | + |
| 187 | +- **`tests/utils/test_output_saver.py`**: Core utility class for saving outputs |
| 188 | +- **`tests/conftest.py`**: Pytest fixtures providing the `save_test_output` function |
| 189 | +- **Integration tests**: Updated to call `save_test_output()` when fixture is available |
| 190 | + |
| 191 | +See `tests/unit/test_output_saver.py` for unit tests of the output saving infrastructure. |
0 commit comments