Financial Institutions are required to report suspicious activity to law enforcement using SARs (Suspicious Activity Reports). This is an ongoing Python project to transform Suspicious Activity Reports (SARs) into structured transactions using agentic workflows. The extracted transaction data can be used to:
- Backtest Transaction Monitoring Systems using a Simulator
- Train ML Models on historical SARs
- Build a Knowledge Graph of Historical Suspicious Activities
This project uses a multi-agent AI system to transform unstructured SAR narratives into structured transaction data. The system employs specialized AI agents working in coordinated workflows to:
- Parse and interpret complex SAR narratives
- Extract entities (individuals, organizations, financial institutions, accounts)
- Generate structured transaction records with complete metadata
- Support parallel processing for large-scale SAR analysis
- Entity Extraction and Resolution: Automatically identifies and resolves entity references within narratives
- Trxn Generation: Use Tools judiciously to simulate a large volume of trxns or extract trxns faithfully if fully specified.
- Evaluation Framework: Built-in metrics and validation for accuracy assessment
The system operates through two coordinated workflows:
SAR Narrative β Entity_Extraction_Agent β Entity_Resolution_Agent β Narrative_Extraction_Agent
- Entity_Extraction_Agent: Identifies individuals, organizations, financial institutions, account IDs, and locations
- Entity_Resolution_Agent: Maps accounts to customer IDs and financial institutions
- Narrative_Extraction_Agent: Creates account-specific sub-narratives for transaction extraction
Sub-Narratives β Router_Agent β Transaction_Generation_Agent β Structured Transactions
- Router_Agent: Routes narratives to appropriate transaction generation agents
- **Transaction_Generation_Agent -w/Tool **: Synthesizes structured transactions with complete metadata
- Parallel Processing: Handles multiple sub-narratives concurrently for performance
Raw SAR Text β Entities & Relationships β Sub-Narratives β Transaction Records β CSV/JSON Output
SAR_NARRATIVES_TO_TRXNS/
βββ agents/ # Agent logic (LLMs or rules)
βββ configs/ # Config files for parameters, paths, etc.
βββ data/ # Input or processed datasets
βββ evals/ # Evaluation scripts or results
βββ experiments/ # Experiments and test runs
βββ temp/ # Temporary or intermediate files
βββ tests/ # Unit tests (using unittest)
βββ venv/ # Python virtual environment
βββ .gitignore
βββ main.py # Entry point
βββ README.md
βββ requirements.txt
βββ utils.py # Helper functions
git clone https://github.com/govindgnair23/SAR_to_Trxns.git
cd SAR-to-Trxnspython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtCreate a .env file in the root directory with your OpenAI API key:
# .env file
OPEN_API_KEY=your_openai_api_key_hereSecurity Note: Never commit your .env file to version control. The .gitignore file should already exclude it.
The agent configurations are stored in configs/agents_config.yaml. You can modify:
- Model types (gpt-4o-mini, gpt-4.1, etc.)
- Temperature settings
- System prompts
- Agent behavior parameters
Command Line Interface:
python main.py data/input/sar_test_01.txtWeb Interface:
streamlit run ui.pyThe web interface provides an interactive way to upload SAR files and visualize results.
This project uses the built-in unittest framework.
To run all tests:
python -m unittest discover -s tests -p 'test_*.py'Run Evaluations:
# Evaluate Workflow 1 (Entity Extraction)
python evals/eval_workflow1.py
# Evaluate Workflow 2 (Transaction Generation)
python evals/eval_workflow2.pyThe system accepts SAR narrative text files. Example structure:
Investigation case number: B7845120. Michael Smith, the owner of XYZ Consulting LLC,
is suspected of engaging in suspicious wire transfer activities...
Between February 1, 2023, and May 15, 2023, Smith initiated 15 wire transfers
totaling $450,000 from the business account (#56789-1234) and 10 wire transfers
totaling $300,000 from his personal account (#67890-4321)...
The system generates structured transaction data in CSV format:
| Transaction_ID | Originator_Name | Originator_Account_ID | Beneficiary_Name | Trxn_Amount | Trxn_Date | Trxn_Channel |
|---|---|---|---|---|---|---|
| 1 | Michael Smith | 56789-1234 | Unknown | 50000 | 2023-02-01 | Wire |
| 2 | Michael Smith | 67890-4321 | Unknown | 30000 | 2023-02-15 | Wire |
Complete Fields:
Originator_Name,Originator_Account_ID,Originator_Customer_IDBeneficiary_Name,Beneficiary_Account_ID,Beneficiary_Customer_IDTrxn_Channel,Trxn_Date,Trxn_Amount,Branch_or_ATM_Location
- Language: Python 3.8+
- AI Framework: AutoGen 0.2 (Multi-agent orchestration)
- ML Models: OpenAI GPT-4.1, GPT-4o-mini
- Data Processing: Pandas, NumPy
- Web Interface: Streamlit
- Configuration: PyYAML, python-dotenv
- Visualization: NetworkX, Pyvis
- Testing:
unittest - Version Control: Git + GitHub
The project includes comprehensive evaluation workflows:
- Entity Metrics: Precision, recall, F1-score for entity extraction
- Account Mapping: Accuracy of account-to-customer relationships
- Output:
data/output/evals/workflow1/results_entity_metrics_*.csv
- Transaction Metrics: Count accuracy, amount precision, date validation
- Completeness: Field population rates
- Output:
data/output/evals/workflow2/results_trxn_metrics_*.csv
Run Evaluations:
# Interactive evaluation with UI
python evals/eval_workflow1_ui.py
python evals/eval_workflow2_ui.py- β Multi-Agent AI System: Specialized agents for entity extraction, resolution, and transaction generation
- β Two-Workflow Architecture: Coordinated pipelines for comprehensive SAR processing
- β Entity Resolution: Mapping of multiple accounts held by same entity to same Customer ID
- β Structured Output: Complete transaction records with metadata fields
- β Web Interface: Streamlit-based UI for interactive SAR processing
- β Comprehensive Evaluation: Built-in metrics and validation frameworks
- β Configurable Agents: YAML-based configuration for model selection and behavior
- β Security Best Practices: Environment-based API key management
Data Security:
- Store API keys in environment variables, never in code
- SAR data contains sensitive information - follow your organization's data handling policies
- Generated transaction data should be treated as confidential
Completed:
- Multi-agent architecture with specialized roles
- Parallel processing capabilities
- Web interface integration
- Comprehensive evaluation framework
In Progress:
- Support for tabular SAR formats
Future Enhancements:
- Incorporation into a more coprehensive SAR to Knowledge Graph Framework