An interactive data visualization dashboard for exploring patterns in global terrorism through Principal Component Analysis (PCA) and k-means clustering.
- Overview
- Data Source
- Features
- Key Insights
- Implementation Details
- Installation and Usage
- Project Structure
- Technologies Used
- Future Improvements
- License
- Acknowledgements
This project implements an interactive visualization dashboard for analyzing the Global Terrorism Database (GTD) through dimensionality reduction and clustering techniques. It allows users to explore patterns and relationships in terrorism data through Principal Component Analysis (PCA) and k-means clustering.
The visualization provides interactive tools to:
- Determine optimal dimensionality through PCA
- Identify the most influential features in the dataset
- Discover natural groupings using k-means clustering
- Explore relationships between key terrorism attributes
The project uses the Global Terrorism Database (GTD), a comprehensive open-source database of terrorist events worldwide:
- Source: START Consortium (National Consortium for the Study of Terrorism and Responses to Terrorism)
- Scope: Global terrorist incidents from 1970-2020
- Size: 200,000+ incidents with 135 attributes
- Focus Area: For this analysis, we sampled 10,000 records from 2016-2020 and selected 20 key numerical features
- Scree Plot: Interactive bar chart of eigenvalues with cumulative variance overlay
- Dimensionality Selection: Click functionality to select optimal number of components
- Biplot: Visualization of data points on first two principal components with feature vector overlays
- Reference Lines: Clear x=0 and y=0 axes for better interpretation
- Top Features Algorithm: Identifies attributes with highest PCA loadings based on selected dimensionality
- Dynamic Feature Table: Shows most influential features with importance scores
- Scatterplot Matrix: Displays relationships between top 4 features
- K-means Elbow Plot: Visualizes inertia vs. k values to identify optimal cluster count
- Interactive Cluster Selection: Allows users to select k by clicking on the elbow plot
- Consistent Cluster Coloring: Maintains color scheme across all visualizations
- All visualizations are linked through callbacks
- Selections in one view update all other components
- Provides a cohesive analytical environment
Analysis of the Global Terrorism Database revealed several interesting patterns:
-
Dimensionality Reduction: Approximately 11 principal components capture ~80% of the variance in terrorism data.
-
Most Influential Features:
- Day of incident (
iday): 0.87 importance score - Number of perpetrators captured (
nperpcap): 0.83 importance score - Property damage indicator (
property): 0.61 importance score - Event identifier (
eventid): 0.52 importance score
- Day of incident (
-
Optimal Clustering: The elbow method identified 7 distinct clusters of terrorism incidents, representing different terrorism typologies.
-
Cluster Characteristics:
- Distinct separation in temporal patterns (incident days) across clusters
- Varying patterns in perpetrator captures, suggesting different operational success rates
- Clear differentiation in property damage patterns
- Notable outliers representing unusual terrorism events
- Loading and filtering GTD dataset to recent years (2016-2020)
- Selection of numerical attributes suitable for PCA
- Handling missing values by column mean imputation
- Standardization of features for equal weighting in PCA
- Computation of PCA components, eigenvalues, and explained variance
- Implementation of k-means clustering for k=1...10 with inertia calculation
-
Improved Biplot:
- Reference lines at x=0 and y=0 to delineate quadrants
- Strategic feature vector display to reduce visual clutter
- Increased visibility of feature labels
- Balanced axis scale for proper representation
-
Scatterplot Matrix Optimization:
- Dynamic updating based on selected dimensionality
- Clear visual separation of clusters
- Consistent color scheme with other visualizations
-
Elbow Plot Interface:
- Clear indication of optimal k value
- Interactive selection capability
- Visual feedback on selection
- Python 3.8+
- Plotly
- NumPy, Pandas, Scikit-learn
- Modern web browser (Chrome, Firefox, Safari)
- Clone the repository:
git clone https://github.com/Vkartik-3/gtd-visualization.git
cd gtd-visualization- Install required dependencies:
pip install -r requirements.txt- Run the application:
python app.py- Open your browser and navigate to:
http://localhost:5000
-
Exploring PCA:
- View the scree plot to understand variance distribution
- Click on a bar to select number of components to retain
- Examine the biplot to understand feature relationships
-
Understanding Feature Importance:
- After selecting dimensionality, check the Feature Importance table
- Explore the scatterplot matrix to see relationships between key features
-
Investigating Clusters:
- View the elbow plot to determine optimal number of clusters
- Click on a point to select k value
- Observe how clusters are distributed across different visualizations
Global-terrorism-data-visualization/
├── __pycache__/ # Python cache directory
├── assets/ # Assets directory
│ └── styles.css # CSS styling file
├── data/ # Data directory
│ ├── .DS_Store # macOS directory file
│ └── globalterrorismdb.xlsx # Original GTD dataset
├── app.py # Main application file
├── data_processing.py # Data preprocessing and analysis functions
└── README.md # Project documentation
- Data Processing: Python, NumPy, Pandas
- Machine Learning: Scikit-learn
- Visualization: Plotly
- Web Framework: Flask
- Development Tools: Git, VS Code
- Temporal Analysis: Add time-based visualizations to examine trends over years
- Geographical Integration: Incorporate maps to visualize spatial patterns
- Feature Engineering: Explore additional derived features for enhanced insights
- Alternative Clustering: Implement DBSCAN or hierarchical clustering for comparison
- Real-time Updates: Create pipeline for incorporating new GTD data as it's released
This project is licensed under the MIT License - see the LICENSE file for details.
- START Consortium for providing the Global Terrorism Database
- CSE-564 Visualization course at Stony Brook University
- Plotly for interactive visualization capabilities
- Scikit-learn for machine learning implementations
Developed by Kartik Kirankumar Vadhawana (116740869) as part of CSE-564 Visualization coursework at Stony Brook University.



