Real-time Object Detection Β· Segmentation Β· Pose Estimation Β· Tracking Powered by YOLO26, YOLO World v2 & Streamlit
Live Demo Β· Blog Series Β· Report Bug
Thank you for 400+ β stars! This major update brings a completely rewritten, modular codebase with exciting new capabilities.
| Feature | v1.0 | v2.0 |
|---|---|---|
| Object Detection | YOLOv8n | YOLO26n (NMS-free, 43% faster CPU) |
| Segmentation | YOLOv8n-seg | YOLO26n-seg (multi-scale proto + semantic loss) |
| Pose Estimation | β | β YOLO26n-pose (RLE-based keypoints) |
| Open-Vocabulary | β | β YOLO World v2 (natural language text prompts) |
| Tracking | Basic | ByteTrack + BoTSORT with local + global counting |
| Object Counting | β | β Per-frame local + cumulative global counts |
| Skip Frames | β | β 1β8Γ skip for fast inference on long videos |
| Webcam | OpenCV (broken in cloud) | β streamlit-webrtc (browser-native) |
| Architecture | Monolithic | Modular service-based design |
| Video Metrics | β | β Live FPS, local/global counts & tracking overlay |
| Codebase | helper.py + settings.py |
config Β· model_loader Β· image_service Β· video_service |
- Object Detection β Detect 80+ COCO classes with YOLO26 (NMS-free, edge-optimized)
- YOLO World v2 (Text Prompt) β Natural language prompts like "person in black", "red car", "laptop on table" for open-vocabulary detection
- Instance Segmentation β Pixel-level object segmentation with multi-scale proto modules
- Pose Estimation β Human body keypoint and skeleton detection with RLE precision
- Per-class metrics, confidence scores and detailed results table
- Multiple Sources: Stored videos, Webcam (browser-native via WebRTC), RTSP streams, YouTube URLs
- Real-time Tracking: ByteTrack and BoTSORT algorithms (enabled by default)
- Local + Global Counting: Per-frame counts (green) and cumulative unique-object counts (yellow) displayed on every frame
- Skip Frames: Adjustable 1β8Γ slider for faster inference on long or high-FPS videos
- YOLO World v2 in Video: Natural language text-prompt search in video streams
- Live Metrics: Separate local (this frame) and global (cumulative) sections in sidebar
- Count Overlay: Two-line on-frame badge β local in green, global in yellow
- Modular Design: Separate services for image and video inference
- Centralized Config: Single
config.pyfor all settings - Cached Models:
@st.cache_resourcefor instant model reuse - Clean Routing: Task + Mode based dispatch in
app.py
Tracking-With_object-Detection-MOV.mov
Yolov8.Streamlit.Rampal.Punia-1.mov
| Home Page | Detection Result | Segmentation |
|---|---|---|
- Python 3.9 or higher
- GPU recommended (NVIDIA CUDA) for real-time video inference
- Webcam (optional, for live detection)
# 1. Clone the repository
git clone https://github.com/CodingMantras/yolov8-streamlit-detection-tracking.git
cd yolov8-streamlit-detection-tracking
# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate # Linux / macOS
# venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements.txtThe default detection and segmentation weights are auto-downloaded by Ultralytics on first use. All YOLO26 models (detection, segmentation, pose) and YOLO World v2 (open-vocabulary) are fetched automatically.
To pre-download manually:
# Detection
wget -P weights/ https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26n.pt
# Segmentation
wget -P weights/ https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26n-seg.pt
# Pose estimation
wget -P weights/ https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26n-pose.pt
# YOLO World v2 open-vocabulary (auto-downloads if not present)
wget -P weights/ https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-worldv2.ptstreamlit run app.pyThe app opens at http://localhost:8501.
- Inference Mode β Choose between π· Image Inference or π¬ Video Inference
- Task β Select one of:
- Object Detection β Standard YOLO26 object detection (NMS-free, end-to-end)
- Segmentation β Instance segmentation with pixel masks
- YOLO World v2 (Text Prompt) β Open-vocabulary detection with natural language prompts
- Pose Estimation β Human body keypoint detection
- Model Confidence β Adjust the confidence threshold (10β100%)
- Select π· Image Inference mode
- Choose a task (Detection, Segmentation, YOLO World, or Pose)
- Upload an image or use the default
- For YOLO World v2: type descriptive phrases (e.g.,
person in black, red car, laptop on table) - Click π Run to see results with per-class metrics
- Select π¬ Video Inference mode
- Choose a task
- Pick a video source: Stored Video, Webcam, RTSP, or YouTube
- Object Tracking is enabled by default (ByteTrack or BoTSORT) β local + global counts display automatically
- Adjust Skip Frames (1β8) in the sidebar for faster inference on long videos
- For YOLO World v2: enter natural language prompts to search for in the video
- Click π Detect β local and global metrics appear in the sidebar
Drop .mp4 files into the videos/ directory. They appear automatically in the stored-video dropdown β no code changes required (the config scans the folder at startup).
yolov8-streamlit-detection-tracking/
βββ app.py # Main Streamlit application & routing
βββ config.py # Centralized configuration (paths, models, UI)
βββ model_loader.py # Model loading with @st.cache_resource
βββ image_service.py # Image inference (detection, segmentation, world, pose)
βββ video_service.py # Video inference (tracking, counting, all sources)
βββ requirements.txt # Python dependencies
βββ packages.txt # System packages for Streamlit Cloud
βββ README.md
βββ assets/ # Screenshots and demo media
βββ images/ # Sample images
βββ videos/ # Sample videos (add your .mp4 files here)
βββ weights/ # Model weights (yolov8n.pt, yolov8n-seg.pt, ...)
| Module | Purpose |
|---|---|
config.py |
All paths, model names, UI constants, and default values |
model_loader.py |
Cached model loading; resolves local weights vs auto-download |
image_service.py |
Full image-mode UI: upload β inference β results display |
video_service.py |
Full video-mode UI: source selection β frame loop β live metrics |
app.py |
Page config, sidebar, and routing to the correct service |
All configuration lives in config.py. Key settings:
# Models β change to larger variants for better accuracy
DETECTION_MODEL = "yolo26n.pt" # or yolo26s.pt, yolo26m.pt, yolo26l.pt
SEGMENTATION_MODEL = "yolo26n-seg.pt" # or yolo26s-seg.pt
YOLO_WORLD_MODEL = "yolov8l-worldv2.pt" # open-vocabulary (natural language)
POSE_MODEL = "yolo26n-pose.pt" # or yolo26s-pose.pt
# Inference defaults
DEFAULT_CONFIDENCE = 0.40
DEFAULT_IOU = 0.50
VIDEO_DISPLAY_WIDTH = 720
# Skip-frame control for video inference
DEFAULT_SKIP_FRAMES = 1 # process every frame (1β8)
# YOLO World v2 default prompts
DEFAULT_WORLD_CLASSES = "person in black, red car, dog, laptop on table"To use your own trained model:
# In config.py
DETECTION_MODEL = "my_custom_model.pt"
# Place the .pt file in the weights/ directory- Push the repository to GitHub
- Go to share.streamlit.io and connect your repo
- Set the main file path to
app.py - The
packages.txtfile handles system-level dependencies automatically
Note: Streamlit Cloud has no GPU β video inference will be slower. Image inference works well. Webcam uses streamlit-webrtc so it works natively in the browser (no server-side camera access needed).
Contributions are welcome! Here's how:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m "Add amazing feature" - Push:
git push origin feature/amazing-feature - Open a Pull Request
- Add model benchmarking / comparison page
- Export detection results to CSV / JSON
- Add YOLO-NAS or RT-DETR model support
- Region of Interest (ROI) based counting
- Multi-camera RTSP dashboard
- Ultralytics YOLO26 Documentation
- YOLO World v2 Documentation
- streamlit-webrtc β Browser-native webcam
- Streamlit Documentation
- ByteTrack Paper
- Blog Series β Building this App
This project is open-source and available for educational and research purposes.
- Ultralytics for YOLO26 and YOLO World v2
- Streamlit for the web framework
- streamlit-webrtc for browser-based webcam
- All 400+ stargazers for the love and support!
If you find this project useful, please consider giving it a β!
Made with β€οΈ by Aparsoft