Skip to content

Conversation

@WilhelmZA
Copy link

@WilhelmZA WilhelmZA commented Sep 28, 2025

Description

This pull request implements a complete structured traceroute subsystem and offline-first IP enrichment pipeline for Hyperglass. It transitions traceroute output from raw text dumps into typed, enriched data objects and adds end-to-end UI support to render and interact with structured traceroute results.

The change set includes parser implementations for multiple vendor formats, a fast offline enrichment backend (ASN/org/RDNS/RPKI), UI components to display enriched traceroute data, concurrency and caching improvements, and documentation/sample configuration to operate the feature.

This pull request includes the amazing work of https://github.com/CarlosSuporteISP who introduced structured bgp route for a number of platforms, this expands his existing PR: #340 with enhancements to structured traceroute and ip_enrichment.


Major components

Structured traceroute architecture

  • New data models: TracerouteResult and TracerouteHop with rich per-hop metadata (ip, asn, org, rdns, prefix, rpkistatus, country, loss, sent, last, avg, best, worst, stddev, status).
  • Platform-specific parsers and a normalization layer so different vendor outputs map to a single structured format consumed by the API and UI.
  • Unified output format that supports both structured JSON responses and cleaned raw text when structured parsing is disabled.
  • AS-path visualization enhancements (organization names, improved React Flow integration).

IP enrichment system

  • Offline-first enrichment using BGP.tools bulk data and PeeringDB with a pickle-backed cache and integer-based CIDR lookups for performance.
  • Metadata produced per-hop: ASN, organization name, country code, covering prefix, RDNS, optional RPKI state.
  • Graceful handling for private IP ranges and missing lookup data; fallbacks to network lookups only when configured.
  • Threaded executor and query deduplication to make enrichment non-blocking and efficient under load.

Supported platforms (parsers)

  • MikroTik: Parser supports multi-table output, progressive statistics, and incremental SENT counts. A dedicated cleaner merges repeated tables and collapses trailing timeout hops.
  • Huawei: Parser for VRP/Unix-style traceroute outputs with robust timeout handling.
  • Juniper / Arista / FRR: Additional vendor parsers and normalization glue to ensure consistent hop ordering and fields.

Frontend enhancements

  • New structured traceroute components: traceroute-table, traceroute-fields, traceroute-cell.
  • Improved AS path visualization that resolves ASN → org names for human-friendly rendering.
  • Copy-to-clipboard and export functionality for structured JSON payloads.

Infrastructure improvements

  • Concurrency: non-blocking enrichment with a thread executor to avoid holding request threads for enrichment lookups.
  • Query deduplication: avoid duplicate enrichment work for identical IPs/hops requested concurrently.
  • Pickle-backed caches and integer CIDR matching yields large speed improvements and drastically reduces external API calls.

Files changed (summary)

  • Rough totals: ~94 files changed; 13,602 insertions and 1,912 deletions.

  • Notable additions:

    • hyperglass/models/data/traceroute.py (TracerouteResult, TracerouteHop models)
    • hyperglass/models/parsing/mikrotik.py, huawei.py, traceroute.py
    • hyperglass/plugins/_builtin/trace_route_*.py (vendor traceroute plugins)
    • hyperglass/plugins/_builtin/traceroute_ip_enrichment.py
    • hyperglass/external/ip_enrichment.py (enrichment backend, cache ingestion)
    • hyperglass/ui/components/output/traceroute-*.tsx
    • documentation pages and sample configs in docs/pages/... and .samples/
  • Notable modifications:

    • hyperglass/plugins/_builtin/mikrotik_garbage_output.py — extensive cleaner and aggregation improvements
    • hyperglass/main.py — plugin registration and startup wiring
    • hyperglass/models/config/params.py, structured.py — new configuration options
    • README, Docker/compose, pyproject and lockfiles updated

See the branch diff for the full file list and exact changes.


Motivation and context

The prior traceroute implementation returned only raw text. That limited:

  • Automated extraction of per-hop metadata (ASN, org, prefix)
  • Rich UI experiences (per-hop controls, AS-path org names)
  • Performance due to many external lookups under naive enrichment

This feature introduces typed structured traceroute outputs combined with an offline-first enrichment strategy to provide accurate, fast, and stable metadata while significantly reducing network traffic to external services.

It transforms Hyperglass from a plain looking-glass utility to a structured, analysable traceroute platform with improved diagnostics and visualization.


Detailed technical description

Traceroute data model

The TracerouteResult contains metadata about the probe and an ordered list of TracerouteHop objects. Each hop includes network-level fields and enrichment fields. The model is serializable to JSON and validated with the project’s existing validation layer.

Fields include (non-exhaustive):

  • ip (string | null)
  • hop_index (int)
  • loss, sent, last, avg, best, worst, stddev (numerical statistics)
  • status (string — e.g., timeout/ok)
  • enrichment: { asn, org, prefix, country, rdns, rpki }

The model intentionally keeps hop order fixed (hop position is significant). Parsers must not deduplicate by IP address alone since the same IP can appear at different hops.

Vendor parsers & normalization

Each vendor parser parses raw text and returns an ordered list of hop rows with the above fields. The normalization layer coerces vendor-specific differences (different column order, missing fields) into the unified hop model.

MikroTik specifics:

  • MikroTik prints repeated tables during incremental statistics. The cleaner merges those tables into a single aggregated table by selecting, for each hop index, the row with the highest SENT count (prefers non-timeout rows on ties, and later tables on equal SENT). Excessive trailing timeouts are collapsed into a single aggregation line like “... (N more timeout hops)”.

Huawei specifics:

  • Huawei VRP formats and Unix-style traceroutes are normalized to the same hop fields, with explicit timeout handling.

IP enrichment pipeline

The enrichment component is implemented as an offline-first service that:

  1. Loads a BGP.tools-derived CIDR dataset and PeeringDB caches into an efficient in-memory/pickle-backed integer map.
  2. For each IP, performs fast prefix lookups (integer-range matching) to find the covering prefix and ASN.
  3. Resolves ASN → organization from local PeeringDB lookup.
  4. Optionally performs RDNS and RPKI checks (configurable; can be disabled). RPKI uses pluggable backends (routinator, etc.).

The enrichment plugin batches lookups and uses a thread pool so the request pipeline remains responsive. Query deduplication ensures multiple hops with the same IP don’t trigger duplicate lookups concurrently.

Plugin & request lifecycle changes

  • Output plugin ordering and per-request decision logic were adjusted so that structured parsing and enrichment only occur when allowed by request/device params or global config.
  • The structured top-level configuration acts as the authoritative switch; per-feature tri-state flags allow opt-outs (for enrichment subfeatures like RPKI or RDNS).

Frontend & UX changes

  • New components present structured fields in a table layout, with per-hop expansion to show enrichment details.
  • AS-path visualization shows organization names and clickable ASN details.
  • Copy to clipboard supports exporting the full structured JSON payload for a probe.
  • Loading states and progressive results are supported — the UI receives partial enrichment updates as caching and lookups complete.

Testing performed

Environment: Linux (Ubuntu/Debian), Python 3.11+, browsers: Chrome/Firefox/Safari.

Parser tests:

  • Unit tests created for vendor parsers; validated against real-world MikroTik and Huawei outputs.

Enrichment tests:

  • Integration tests for cache matching and PeeringDB association using sampled datasets.

Manual validation:

  • MikroTik multi-table aggregation and highest-SENT selection verified with lab captures.
  • Timeout-heavy traces validated to ensure trailing timeouts are collapsed appropriately.
  • UI rendering and copy/export behavior tested across browsers.

Load/Concurrency:

  • Simulated concurrent queries to confirm thread executor and deduplication behavior; verified no blocking of main request threads.

Automated:

  • Unit and integration tests included in the branch (run via pytest); frontend component tests added.

Behavioral & operational impact

  • Traceroute responses now can be returned as structured JSON for structured-enabled requests. This changes response shape for those requests — downstream consumers must opt out or adapt if they expect raw text.
  • Structured mode performs enrichment work; operational resources should be sized to accommodate cache memory and thread pool usage.
  • Offline caching reduces external API calls significantly but requires provisioning the BGP.tools and PeeringDB caches for full offline operation.

Configuration & deployment notes

  • Update hyperglass/models/config/params.py and hyperglass/models/config/structured.py as needed to enable or change enrichment behavior.
  • Provide cache directory and ingest BGP.tools dump and PeeringDB data for offline enrichment. See the included docs for ingestion commands and cache paths.
  • RPKI backend is configurable — set endpoint/backends in config if RPKI validation is desired.

Commit / PR context

This branch represents a single large feature release and can be squashed into one comprehensive commit for the upstream PR. The PR message included with the branch accurately reflects motivation, testing, and scope; this document provides a detailed technical explanation and operational notes.

- Add type checking to handle already-processed structured output
- Skip processing when output is not raw strings (e.g., BGPRouteTable objects)
- Prevents AttributeError when trying to call .strip() on tuple/structured data

Fixes crash when MikroTik garbage plugin runs after structured parsing plugins.
- Split type checking into two separate conditions for clarity
- Add early return when output contains non-string objects
- Ensure structured output like MikrotikBGPRouteTable passes through unchanged
- Prevent AttributeError on .strip() for tuple containing structured data

Resolves remaining crashes when processing already-structured BGP route data.
- Add IPv6 address pattern matching alongside IPv4
- Extract SENT count from traceroute lines to prioritize complete data
- Keep lines with highest SENT count (final results) instead of first occurrence
- Maintain original hop order while using best data for each IP
- Prefer non-timeout results when SENT counts are equal

Fixes issues where IPv4 showed early results (SENT=1) instead of final (SENT=3)
and IPv6 showed all intermediate steps instead of final consolidated results.
- Fix MED field showing N/A when value is 0 (now correctly shows 0)
- Hide Originator field when empty instead of showing N/A
- Hide Age field when not available instead of showing N/A
- Add HideableField component for fields that should be hidden when empty
- Update string output formatting to skip hidden fields in text export

Resolves display issues where valid zero values and unavailable fields
were showing as N/A instead of proper formatting or being hidden.
- Remove custom __init__ from HuaweiBGPRouteTable class
- Use standard BGPRouteTable inheritance like MikroTik does
- Add BGPRoute import and instantiate BGPRoute objects in bgp_table()
- Enable proper Pydantic validation for both table and routes
- Ensure consistent behavior across all BGP parsers
- Update .gitignore to exclude dev-build.sh and dev-docker/
…riable naming

- Add BGPRoute import to Juniper, Arista, and FRR parsers
- Change all parsers to instantiate BGPRoute objects instead of raw dictionaries
- Harmonize variable naming across all BGP parsers:
  - Use 'route_data' for dictionary variable
  - Use 'routes' for output list
  - Use 'route' for loop variable
- Ensure consistent validation behavior across MikroTik, Huawei, Juniper, Arista, and FRR
- All parsers now properly invoke BGP table class validation
- Add BGPRoute import and export in hyperglass.models.data.__init__.py
- This allows BGP parsers to properly import BGPRoute for validation
- Resolves ImportError when starting hyperglass application
MAJOR NEW ARCHITECTURE - STRUCTURED TRACEROUTE:
- Complete rewrite of traceroute data processing with structured output
- Dedicated TracerouteResult and TracerouteHop data models
- Platform-specific parsers with unified output format
- Rich metadata including ASN, organization, country, and prefix information
- AS path visualization with organization names in React Flow charts

SUPPORTED PLATFORMS:
- TraceroutePluginMikrotik: Handles MikroTik's complex multi-table format
  * Progressive statistics parsing with deduplication
  * Timeout hop handling and continuation line processing
  * Loss percentage and RTT statistics extraction
- TraceroutePluginHuawei: Unix-style traceroute format parser
  * Standard hop_number ip_address rtt format support
  * Timeout hop detection with * notation
  * Automatic cleanup of excessive trailing timeouts

COMPREHENSIVE IP ENRICHMENT SYSTEM:
- Offline enrichment using BGP.tools bulk data (1.3M+ CIDR entries)
- PeeringDB integration for IXP detection and ASN organization data
- Ultra-fast pickle cache system with combined data files
- Integer-based bitwise IP matching for maximum performance
- Bulk ASN organization lookup capabilities
- Private/reserved IP handling with AS0 fallbacks
- Country code mapping from ASN database
- Graceful fallbacks for missing enrichment data

FRONTEND ENHANCEMENTS:
- New traceroute table components with consistent formatting
- Enhanced AS path visualization with organization names
- Improved copy-to-clipboard functionality with structured data
- Unified table styling across BGP and traceroute results
- Better error handling and loading states

CONCURRENT PROCESSING INFRASTRUCTURE:
- Thread executor implementation for blocking I/O operations
- Query deduplication system to prevent resource conflicts
- Non-blocking Redis cache operations using asyncio executors
- Event coordination for waiting requests
- Background cleanup for completed operations
- Prevents website hangs during long-running queries

PLUGIN ARCHITECTURE IMPROVEMENTS:
- Platform-aware plugin system with proper execution restrictions
- Enhanced MikroTik garbage output cleaning
- IP enrichment plugins for both BGP routes and traceroute
- Conditional plugin execution based on platform detection
- Proper async/sync plugin method handling

CRITICAL BUG FIXES:
- Fixed double AS prefix bug (ASAS123456 → AS123456)
- Resolved TracerouteHop avg_rtt field/property conflicts
- Corrected Huawei traceroute source field validation
- Fixed plugin platform restriction enforcement
- Eliminated blocking I/O causing UI freezes
- Proper timeout and empty response caching prevention
- Enhanced private IP range detection and handling

PERFORMANCE OPTIMIZATIONS:
- Pickle cache system reduces startup time from seconds to milliseconds
- Bulk processing for ASN organization lookups
- Simplified IXP detection using single PeeringDB API call
- Efficient CIDR network sorting and integer-based lookups
- Reduced external API calls by 90%+
- Optimized memory usage for large datasets

API & ROUTING ENHANCEMENTS:
- Enhanced API routes with proper error handling
- Improved middleware for concurrent request processing
- Better state management and event handling
- Enhanced task processing with thread pool execution

This represents a complete transformation of hyperglass traceroute capabilities,
moving from basic text output to rich, structured data with comprehensive
network intelligence and concurrent processing support.
WilhelmZA and others added 2 commits September 30, 2025 16:46
Summary:
- Add structured traceroute support with comprehensive IP enrichment (ASN/org/RDNS).
- Improve MikroTik traceroute cleaning and aggregation; collapse repeated tables into a single representative table.
- Enhance traceroute logging for visibility and add traceroute-specific cleaning helpers.
- Add/adjust IP enrichment plugins and BGP/traceroute enrichment integrations.
- UI updates for traceroute output and path visualization; update docs and configuration for structured output.

This commit squashes changes from 'structured-dev' into a single release commit.
@WilhelmZA
Copy link
Author

@jdhall75 I have finished my commits for the moment and ready for review. This commit covers all the aspects of the pull request opened by Carlos: #340

We should consider doing this as one pull to main @CarlosSuporteISP ?

@WilhelmZA WilhelmZA changed the title Add structured traceroute support with IP enrichment and concurrent processing + bug fixes for mikrotik feat(structured): add structured traceroute + comprehensive IP enrichment and UI Updates Sep 30, 2025
@jdhall75
Copy link
Collaborator

So.. for future reference. PRs should be small and focused on a particular function of the application. It's considered the #1 rule for PRs. I know with AI it's easy to get carried away and make broad sweeping changes, but a human has to review it.

I appreciate the all the contributions and I am sure many others will too. This is just something to keep in mind for the next one.

As soon as we get some feedback from @CarlosSuporteISP I'll start the review and close out one of the PRs.

@jdhall75 jdhall75 self-requested a review October 1, 2025 00:48
Copy link

@hcaldicott hcaldicott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up that this PR contains modifications to the compose.yaml file which I don't feel should be included. I have only had a cursory glance at this so far in my own testing but a much more thorough review is needed with this in mind.

I don't have much feedback in the way of the PR as a whole, as I'm completely new to Hyperglass and I am using this PR in order to stand up the project for evaluation. Apologies for chiming in our of no-where with this comment otherwise :)

MAJOR ENHANCEMENTS:

IP Enrichment Service (hyperglass/external/ip_enrichment.py):
- Increase IXP data cache duration from 24 hours to 7 days (604800s) for better performance
- Fix critical cache refresh logic: ensure_data_loaded() now properly checks expiry before using existing pickle files
- Remove 'force' refresh parameters from public APIs and admin endpoints to prevent potential abuse/DDOS
- Implement automatic refresh based on file timestamps and cache duration
- Add comprehensive debug logging gated by Settings.debug throughout the module
- Clean up verbose comments and improve code readability
- Update configuration model to enforce 7-day minimum cache timeout

MikroTik Traceroute Processing:
- Refactor trace_route_mikrotik plugin to use garbage cleaner before structured parsing
- Only log raw router output when Settings.debug is enabled to reduce log verbosity
- Simplify MikrotikTracerouteTable parser to expect pre-cleaned input from garbage cleaner
- Remove complex multi-table detection, format detection, and deduplication logic (handled by cleaner)
- Add concise debug messages for processing decisions and configuration states

Traceroute IP Enrichment (traceroute_ip_enrichment.py):
- Implement concurrent reverse DNS lookups using asyncio.to_thread and asyncio.gather
- Add async wrapper for reverse DNS with proper error handling and fallbacks
- Significant performance improvement for multi-hop traceroutes (parallel vs sequential DNS)
- Proper debug logging gates: only detailed logs when Settings.debug=True
- Upgrade operational messages to log.info level (start/completion status)
- Maintain compatibility with different event loop contexts and runtime environments

Configuration Updates:
- Update structured.ip_enrichment.cache_timeout default to 604800 seconds
- Update documentation to reflect new cache defaults and behavior
- Remove force refresh options from admin API endpoints

MIGRATION NOTES:
- Operators should ensure /etc/hyperglass/ip_enrichment directory is writable
- Any code relying on force refresh parameters must be updated
- Monitor logs for automatic refresh behavior and performance improvements
- The 7-day cache significantly reduces PeeringDB API load

PERFORMANCE BENEFITS:
- Faster traceroute enrichment due to concurrent DNS lookups
- Reduced external API calls with longer IXP cache duration
- More reliable refresh logic prevents stale cache usage
- Cleaner, more focused debug output when debug mode is disabled

TECHNICAL DETAILS:
- Uses asyncio.to_thread for non-blocking DNS operations
- Implements process-wide file locking for safe concurrent cache updates
- Robust fallbacks for various asyncio execution contexts
- Maintains backward compatibility while improving performance

FILES MODIFIED:
- hyperglass/external/ip_enrichment.py
- hyperglass/models/config/structured.py
- hyperglass/api/routes.py
- hyperglass/plugins/_builtin/trace_route_mikrotik.py
- hyperglass/models/parsing/mikrotik.py
- hyperglass/plugins/_builtin/traceroute_ip_enrichment.py
- docs/pages/configuration/config/structured-output.mdx
@WilhelmZA
Copy link
Author

WilhelmZA commented Oct 5, 2025

Manually cleaned up and reviewed the following files, fixing some bugs and issues there was with edge cases and found by users:
FILES MODIFIED:

  • hyperglass/external/ip_enrichment.py
  • hyperglass/models/config/structured.py
  • hyperglass/api/routes.py
  • hyperglass/plugins/_builtin/trace_route_mikrotik.py
  • hyperglass/models/parsing/mikrotik.py
  • hyperglass/plugins/_builtin/traceroute_ip_enrichment.py
  • docs/pages/configuration/config/structured-output.mdx

Details: 4a10576

- Remove ip_enrichment_status and ip_enrichment_refresh endpoints from API routes
- Remove unused imports and handlers from API initialization
- Background refresh functionality in query endpoint remains intact
- Cleanup dead code that had no UI or client integration
IP Enrichment Changes:
- Remove PeeringDB dependency and file caching infrastructure
- Implement real-time BGP.TOOLS WHOIS API integration (port 43)
- Simplify IPEnrichmentService with bulk query support
- Add comprehensive error handling and logging
- Reduce module from 1,545 to 641 lines (58% reduction)
- Update documentation to reflect BGP.TOOLS-only approach

Validation System Fixes:
- Fix Pydantic v2 model validator signatures across multiple classes
- Use Field(default_factory=...) pattern to prevent premature instantiation
- Resolve KeyError: None in ThemeColors.validate_colors validator
- Fix Credential.validate_credential method signature for 'after' mode
- Ensure proper validation context for all model validators

MikroTik Improvements:
- Add intelligent retry logic for empty BGP route responses
- Retry up to 2 times when execution time < 5 seconds
- Include 2-second delay between retries for device stabilization

Security Enhancements:
- Sanitize connection error messages to remove sensitive network information
- Remove 'Device settings: platform IP:port' from user-facing errors
- Apply sanitization to DeviceTimeout, ScrapeError, AuthError, RestError
- Provide generic fallback messages for empty errors

UI Improvements:
- Hide 'None' group filter when no directive groups exist
- Filter out directives with null/empty names from query type dropdown
- Improve UI cleanliness and user experience

Configuration Updates:
- Clean up structured output configuration comments
- Remove deprecated cache_timeout and refresh settings
- Update model field definitions for consistency
- Updated FRR traceroute parser to handle new simplified output format
- Added comprehensive regex patterns for various traceroute line formats
- Improved hop parsing with support for timeouts, hostnames, and multiple RTTs
- Enhanced IP address validation and hostname handling
- Added detailed debug logging for troubleshooting
- Cleaned up duplicate hop handling and sorting
- Maintained backward compatibility with existing traceroute output
- Removed experimental MTR functionality that had timeout issues
- Restored all BGP directives (BGP, BGPRoute, BGPCommunity, BGPASPath)
- Streamlined plugin to focus on reliable traditional traceroute parsing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants