Skip to content

govindgnair23/SAR_to_Trxns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SAR to Transactions

Financial Institutions are required to report suspicious activity to law enforcement using SARs (Suspicious Activity Reports). This is an ongoing Python project to transform Suspicious Activity Reports (SARs) into structured transactions using agentic workflows. The extracted transaction data can be used to:

  1. Backtest Transaction Monitoring Systems using a Simulator
  2. Train ML Models on historical SARs
  3. Build a Knowledge Graph of Historical Suspicious Activities

🧠 Project Overview

This project uses a multi-agent AI system to transform unstructured SAR narratives into structured transaction data. The system employs specialized AI agents working in coordinated workflows to:

  • Parse and interpret complex SAR narratives
  • Extract entities (individuals, organizations, financial institutions, accounts)
  • Generate structured transaction records with complete metadata
  • Support parallel processing for large-scale SAR analysis

Key Capabilities

  • Entity Extraction and Resolution: Automatically identifies and resolves entity references within narratives
  • Trxn Generation: Use Tools judiciously to simulate a large volume of trxns or extract trxns faithfully if fully specified.
  • Evaluation Framework: Built-in metrics and validation for accuracy assessment

πŸ—οΈ Architecture Overview

The system operates through two coordinated workflows:

Workflow 1: Entity Extraction & Resolution

SAR Narrative β†’ Entity_Extraction_Agent β†’ Entity_Resolution_Agent β†’ Narrative_Extraction_Agent
  1. Entity_Extraction_Agent: Identifies individuals, organizations, financial institutions, account IDs, and locations
  2. Entity_Resolution_Agent: Maps accounts to customer IDs and financial institutions
  3. Narrative_Extraction_Agent: Creates account-specific sub-narratives for transaction extraction

Workflow 2: Transaction Generation

Sub-Narratives β†’ Router_Agent β†’ Transaction_Generation_Agent β†’ Structured Transactions
  1. Router_Agent: Routes narratives to appropriate transaction generation agents
  2. **Transaction_Generation_Agent -w/Tool **: Synthesizes structured transactions with complete metadata
  3. Parallel Processing: Handles multiple sub-narratives concurrently for performance

Data Flow

Raw SAR Text β†’ Entities & Relationships β†’ Sub-Narratives β†’ Transaction Records β†’ CSV/JSON Output

πŸ“ Directory Structure

SAR_NARRATIVES_TO_TRXNS/
β”œβ”€β”€ agents/              # Agent logic (LLMs or rules)
β”œβ”€β”€ configs/             # Config files for parameters, paths, etc.
β”œβ”€β”€ data/                # Input or processed datasets
β”œβ”€β”€ evals/               # Evaluation scripts or results
β”œβ”€β”€ experiments/         # Experiments and test runs
β”œβ”€β”€ temp/                # Temporary or intermediate files
β”œβ”€β”€ tests/               # Unit tests (using unittest)
β”œβ”€β”€ venv/                # Python virtual environment
β”œβ”€β”€ .gitignore
β”œβ”€β”€ main.py              # Entry point
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
└── utils.py             # Helper functions


πŸš€ Getting Started

1. Clone the repository

git clone https://github.com/govindgnair23/SAR_to_Trxns.git
cd SAR-to-Trxns

2. Create a virtual environment and install dependencies

python -m venv venv
source venv/bin/activate     # On Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Configure Environment Variables

Create a .env file in the root directory with your OpenAI API key:

# .env file
OPEN_API_KEY=your_openai_api_key_here

Security Note: Never commit your .env file to version control. The .gitignore file should already exclude it.

4. Configure Agents (Optional)

The agent configurations are stored in configs/agents_config.yaml. You can modify:

  • Model types (gpt-4o-mini, gpt-4.1, etc.)
  • Temperature settings
  • System prompts
  • Agent behavior parameters

5. Run the project

Command Line Interface:

python main.py data/input/sar_test_01.txt

Web Interface:

streamlit run ui.py

The web interface provides an interactive way to upload SAR files and visualize results.


πŸ§ͺ Running Tests

This project uses the built-in unittest framework.

To run all tests:

python -m unittest discover -s tests -p 'test_*.py'

Run Evaluations:

# Evaluate Workflow 1 (Entity Extraction)
python evals/eval_workflow1.py

# Evaluate Workflow 2 (Transaction Generation)  
python evals/eval_workflow2.py

πŸ“Š Input & Output Formats

Input Format

The system accepts SAR narrative text files. Example structure:

Investigation case number: B7845120. Michael Smith, the owner of XYZ Consulting LLC, 
is suspected of engaging in suspicious wire transfer activities...

Between February 1, 2023, and May 15, 2023, Smith initiated 15 wire transfers 
totaling $450,000 from the business account (#56789-1234) and 10 wire transfers 
totaling $300,000 from his personal account (#67890-4321)...

Output Format

The system generates structured transaction data in CSV format:

Transaction_ID Originator_Name Originator_Account_ID Beneficiary_Name Trxn_Amount Trxn_Date Trxn_Channel
1 Michael Smith 56789-1234 Unknown 50000 2023-02-01 Wire
2 Michael Smith 67890-4321 Unknown 30000 2023-02-15 Wire

Complete Fields:

  • Originator_Name, Originator_Account_ID, Originator_Customer_ID
  • Beneficiary_Name, Beneficiary_Account_ID, Beneficiary_Customer_ID
  • Trxn_Channel, Trxn_Date, Trxn_Amount, Branch_or_ATM_Location

🧰 Tech Stack

  • Language: Python 3.8+
  • AI Framework: AutoGen 0.2 (Multi-agent orchestration)
  • ML Models: OpenAI GPT-4.1, GPT-4o-mini
  • Data Processing: Pandas, NumPy
  • Web Interface: Streamlit
  • Configuration: PyYAML, python-dotenv
  • Visualization: NetworkX, Pyvis
  • Testing: unittest
  • Version Control: Git + GitHub

πŸ” Evaluation Framework

The project includes comprehensive evaluation workflows:

Workflow 1 Evaluation

  • Entity Metrics: Precision, recall, F1-score for entity extraction
  • Account Mapping: Accuracy of account-to-customer relationships
  • Output: data/output/evals/workflow1/results_entity_metrics_*.csv

Workflow 2 Evaluation

  • Transaction Metrics: Count accuracy, amount precision, date validation
  • Completeness: Field population rates
  • Output: data/output/evals/workflow2/results_trxn_metrics_*.csv

Run Evaluations:

# Interactive evaluation with UI
python evals/eval_workflow1_ui.py
python evals/eval_workflow2_ui.py

βœ… Features

  • βœ… Multi-Agent AI System: Specialized agents for entity extraction, resolution, and transaction generation
  • βœ… Two-Workflow Architecture: Coordinated pipelines for comprehensive SAR processing
  • βœ… Entity Resolution: Mapping of multiple accounts held by same entity to same Customer ID
  • βœ… Structured Output: Complete transaction records with metadata fields
  • βœ… Web Interface: Streamlit-based UI for interactive SAR processing
  • βœ… Comprehensive Evaluation: Built-in metrics and validation frameworks
  • βœ… Configurable Agents: YAML-based configuration for model selection and behavior
  • βœ… Security Best Practices: Environment-based API key management

πŸ”’ Security & Compliance

Data Security:

  • Store API keys in environment variables, never in code
  • SAR data contains sensitive information - follow your organization's data handling policies
  • Generated transaction data should be treated as confidential

πŸ“Œ Roadmap

Completed:

  • Multi-agent architecture with specialized roles
  • Parallel processing capabilities
  • Web interface integration
  • Comprehensive evaluation framework

In Progress:

  • Support for tabular SAR formats

Future Enhancements:

  • Incorporation into a more coprehensive SAR to Knowledge Graph Framework

About

Extract transactions of interest from a Suspicious Activity Report (SAR)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published