Skip to content

aparsoft/yolo-streamlit-detection-tracking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

161 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ YOLO Vision Studio

Real-time Object Detection Β· Segmentation Β· Pose Estimation Β· Tracking Powered by YOLO26, YOLO World v2 & Streamlit

Stars Python Streamlit Ultralytics License

Live Demo Β· Blog Series Β· Report Bug


πŸ†• What's New in v2.0

Thank you for 400+ ⭐ stars! This major update brings a completely rewritten, modular codebase with exciting new capabilities.

Feature v1.0 v2.0
Object Detection YOLOv8n YOLO26n (NMS-free, 43% faster CPU)
Segmentation YOLOv8n-seg YOLO26n-seg (multi-scale proto + semantic loss)
Pose Estimation ❌ βœ… YOLO26n-pose (RLE-based keypoints)
Open-Vocabulary ❌ βœ… YOLO World v2 (natural language text prompts)
Tracking Basic ByteTrack + BoTSORT with local + global counting
Object Counting ❌ βœ… Per-frame local + cumulative global counts
Skip Frames ❌ βœ… 1–8Γ— skip for fast inference on long videos
Webcam OpenCV (broken in cloud) βœ… streamlit-webrtc (browser-native)
Architecture Monolithic Modular service-based design
Video Metrics ❌ βœ… Live FPS, local/global counts & tracking overlay
Codebase helper.py + settings.py config Β· model_loader Β· image_service Β· video_service

✨ Features

πŸ“· Image Inference

  • Object Detection β€” Detect 80+ COCO classes with YOLO26 (NMS-free, edge-optimized)
  • YOLO World v2 (Text Prompt) β€” Natural language prompts like "person in black", "red car", "laptop on table" for open-vocabulary detection
  • Instance Segmentation β€” Pixel-level object segmentation with multi-scale proto modules
  • Pose Estimation β€” Human body keypoint and skeleton detection with RLE precision
  • Per-class metrics, confidence scores and detailed results table

🎬 Video Inference

  • Multiple Sources: Stored videos, Webcam (browser-native via WebRTC), RTSP streams, YouTube URLs
  • Real-time Tracking: ByteTrack and BoTSORT algorithms (enabled by default)
  • Local + Global Counting: Per-frame counts (green) and cumulative unique-object counts (yellow) displayed on every frame
  • Skip Frames: Adjustable 1–8Γ— slider for faster inference on long or high-FPS videos
  • YOLO World v2 in Video: Natural language text-prompt search in video streams
  • Live Metrics: Separate local (this frame) and global (cumulative) sections in sidebar
  • Count Overlay: Two-line on-frame badge β€” local in green, global in yellow

πŸ—οΈ Architecture

  • Modular Design: Separate services for image and video inference
  • Centralized Config: Single config.py for all settings
  • Cached Models: @st.cache_resource for instant model reuse
  • Clean Routing: Task + Mode based dispatch in app.py

πŸ“Έ Demo

Tracking with Object Detection

Tracking-With_object-Detection-MOV.mov

Application Overview

Yolov8.Streamlit.Rampal.Punia-1.mov

Screenshots

Home Page Detection Result Segmentation

πŸš€ Quick Start

Prerequisites

  • Python 3.9 or higher
  • GPU recommended (NVIDIA CUDA) for real-time video inference
  • Webcam (optional, for live detection)

Installation

# 1. Clone the repository
git clone https://github.com/CodingMantras/yolov8-streamlit-detection-tracking.git
cd yolov8-streamlit-detection-tracking

# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate        # Linux / macOS
# venv\Scripts\activate         # Windows

# 3. Install dependencies
pip install -r requirements.txt

Download Model Weights

The default detection and segmentation weights are auto-downloaded by Ultralytics on first use. All YOLO26 models (detection, segmentation, pose) and YOLO World v2 (open-vocabulary) are fetched automatically.

To pre-download manually:

# Detection
wget -P weights/ https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26n.pt

# Segmentation
wget -P weights/ https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26n-seg.pt

# Pose estimation
wget -P weights/ https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26n-pose.pt

# YOLO World v2 open-vocabulary (auto-downloads if not present)
wget -P weights/ https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-worldv2.pt

Run the App

streamlit run app.py

The app opens at http://localhost:8501.


πŸ“– Usage Guide

Sidebar Controls

  1. Inference Mode β€” Choose between πŸ“· Image Inference or 🎬 Video Inference
  2. Task β€” Select one of:
    • Object Detection β€” Standard YOLO26 object detection (NMS-free, end-to-end)
    • Segmentation β€” Instance segmentation with pixel masks
    • YOLO World v2 (Text Prompt) β€” Open-vocabulary detection with natural language prompts
    • Pose Estimation β€” Human body keypoint detection
  3. Model Confidence β€” Adjust the confidence threshold (10–100%)

Image Inference

  1. Select πŸ“· Image Inference mode
  2. Choose a task (Detection, Segmentation, YOLO World, or Pose)
  3. Upload an image or use the default
  4. For YOLO World v2: type descriptive phrases (e.g., person in black, red car, laptop on table)
  5. Click πŸš€ Run to see results with per-class metrics

Video Inference

  1. Select 🎬 Video Inference mode
  2. Choose a task
  3. Pick a video source: Stored Video, Webcam, RTSP, or YouTube
  4. Object Tracking is enabled by default (ByteTrack or BoTSORT) β€” local + global counts display automatically
  5. Adjust Skip Frames (1–8) in the sidebar for faster inference on long videos
  6. For YOLO World v2: enter natural language prompts to search for in the video
  7. Click πŸš€ Detect β€” local and global metrics appear in the sidebar

Adding Your Own Videos

Drop .mp4 files into the videos/ directory. They appear automatically in the stored-video dropdown β€” no code changes required (the config scans the folder at startup).


πŸ—‚οΈ Project Structure

yolov8-streamlit-detection-tracking/
β”œβ”€β”€ app.py                # Main Streamlit application & routing
β”œβ”€β”€ config.py             # Centralized configuration (paths, models, UI)
β”œβ”€β”€ model_loader.py       # Model loading with @st.cache_resource
β”œβ”€β”€ image_service.py      # Image inference (detection, segmentation, world, pose)
β”œβ”€β”€ video_service.py      # Video inference (tracking, counting, all sources)
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ packages.txt          # System packages for Streamlit Cloud
β”œβ”€β”€ README.md
β”œβ”€β”€ assets/               # Screenshots and demo media
β”œβ”€β”€ images/               # Sample images
β”œβ”€β”€ videos/               # Sample videos (add your .mp4 files here)
└── weights/              # Model weights (yolov8n.pt, yolov8n-seg.pt, ...)

Module Responsibilities

Module Purpose
config.py All paths, model names, UI constants, and default values
model_loader.py Cached model loading; resolves local weights vs auto-download
image_service.py Full image-mode UI: upload β†’ inference β†’ results display
video_service.py Full video-mode UI: source selection β†’ frame loop β†’ live metrics
app.py Page config, sidebar, and routing to the correct service

βš™οΈ Configuration

All configuration lives in config.py. Key settings:

# Models β€” change to larger variants for better accuracy
DETECTION_MODEL    = "yolo26n.pt"        # or yolo26s.pt, yolo26m.pt, yolo26l.pt
SEGMENTATION_MODEL = "yolo26n-seg.pt"    # or yolo26s-seg.pt
YOLO_WORLD_MODEL   = "yolov8l-worldv2.pt" # open-vocabulary (natural language)
POSE_MODEL         = "yolo26n-pose.pt"   # or yolo26s-pose.pt

# Inference defaults
DEFAULT_CONFIDENCE = 0.40
DEFAULT_IOU        = 0.50
VIDEO_DISPLAY_WIDTH = 720

# Skip-frame control for video inference
DEFAULT_SKIP_FRAMES = 1   # process every frame (1–8)

# YOLO World v2 default prompts
DEFAULT_WORLD_CLASSES = "person in black, red car, dog, laptop on table"

Custom Models

To use your own trained model:

# In config.py
DETECTION_MODEL = "my_custom_model.pt"
# Place the .pt file in the weights/ directory

☁️ Deploy to Streamlit Cloud

  1. Push the repository to GitHub
  2. Go to share.streamlit.io and connect your repo
  3. Set the main file path to app.py
  4. The packages.txt file handles system-level dependencies automatically

Note: Streamlit Cloud has no GPU β€” video inference will be slower. Image inference works well. Webcam uses streamlit-webrtc so it works natively in the browser (no server-side camera access needed).


🀝 Contributing

Contributions are welcome! Here's how:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m "Add amazing feature"
  4. Push: git push origin feature/amazing-feature
  5. Open a Pull Request

Ideas for Contributions

  • Add model benchmarking / comparison page
  • Export detection results to CSV / JSON
  • Add YOLO-NAS or RT-DETR model support
  • Region of Interest (ROI) based counting
  • Multi-camera RTSP dashboard

πŸ“š Resources


πŸ“„ License

This project is open-source and available for educational and research purposes.

πŸ™ Acknowledgements


If you find this project useful, please consider giving it a ⭐!

Made with ❀️ by Aparsoft

Releases

No releases published

Packages

 
 
 

Contributors

Languages