Skip to content

Latest commit

 

History

History
125 lines (104 loc) · 9.06 KB

File metadata and controls

125 lines (104 loc) · 9.06 KB

Codebase Review

Higher-level suggestions that were not applied directly.


Open

  • add RegionLink::is_empty and same for link conditions to remove manual > 0 checks at callsites. break region link should be async. never use run_closure_in with Command if you can easily make the call site async, use async command then to not block calling thread on sync cmds

IPv6 link-local / RA-driven provisioning branch (dev)

DeviceSetupData clonedev: dev.clone() in DeviceBuilder::build(). All fields (id, name, ns, mtu, interfaces) are used in setup_device_async, so the clone is structurally required.

add_host hardcodes /24 assumption (low)

add_host(cidr, host) replaces only the last octet, which only works for /24 subnets. If the allocator ever moves to /16 or /25, this will silently produce wrong addresses.

nftables via netlink (won't fix)

Rust crates for nftables via netlink (nftnl, rustables, mnl) exist but have immature APIs. Not worth replacing Command::new("nft") for now.

TOML config ignores regions (medium)

from_config() parses regions from TOML but does not call add_region() or link_regions(). Region topologies can only be built programmatically.

Link condition loss is egress-only (medium)

tc netem loss on a device interface only affects outbound packets. A "50% loss" link actually delivers ~50% in one direction and 100% in the other, which does not match how real lossy links (e.g. WiFi) behave. We should either apply netem on both ends of the veth pair, or use a single ingress + egress qdisc setup (e.g. ifb mirroring) so that LinkCondition::Wifi gives symmetric loss. The test bounds in loss_udp_moderate are currently widened to paper over this. Needs a design decision on where to apply qdiscs.

spawn_reflector is fire-and-forget (medium)

spawn_reflector enqueues a task on the namespace worker and returns immediately. There is no guarantee the reflector socket is bound by the time callers start sending probes, which causes intermittent test failures. The API should be async (return once the socket is bound) and return a drop guard that stops the reflector when dropped. This would eliminate the manual sleep(300ms) after every spawn_reflector call in tests.

ip route replace shelling in break/restore (low)

break_region_link() and restore_region_link() use Command::new("ip") to replace routes. These could use the netlink API (nl_run + Netlink) for consistency, but the sync run_closure_in path avoids async overhead for these rare operations.


Completed

  1. Dead apply_region_latency_dual + Qdisc builder — already deleted in earlier refactors; Qdisc struct remains, used by apply_impair/remove_qdisc.
  2. Ipv6Profile redundant variants — added doc comment noting future divergence intent (RA intervals, prefix delegation, SLAAC vs DHCPv6). Variants kept for RouterPreset::recommended_ipv6_profile() mapping.
  3. OutDir enum + runner uses Exact — replaced separate outdir/run_dir fields on LabOpts with OutDir enum (Nested creates timestamped subdir, Exact writes directly). Runner uses OutDir::Exact, removing need for PATCHBAY_OUTDIR env var in e2e tests.
  4. Server recursive scandiscover_runs scans up to 3 levels deep, serves single-run if base dir itself contains events.jsonl, skips symlinks to avoid duplicates from runner's latest link.
  5. Arc<str> migration + NetworkCore method extraction - All internal String fields migrated to Arc<str> (clones become refcount bumps). Lock-body logic extracted into NetworkCore methods (prepare_link_regions, prepare_add_iface, prepare_replug_iface, remove_device, remove_router, resolve_link_target, remove_device_iface, renew_device_ip, add_dns_entry, router_nat_v6_params, etc.) with purpose-built setup structs. Eliminates let x; { lock; x = ...; } pattern throughout lab.rs and handles.rs.
  6. Mutex/lock architecture overhaul - LabInner struct with netns + cancel outside the mutex; with()/with_mut() helpers on handles; cached name/ns on Device/Router/Ix; per-node tokio::sync::Mutex<()> for operation serialization; parking_lot::Mutex for topology lock; all handle mutation methods made async; pre-await reads combined into single lock; set_nat_v6_mode write order fixed
  7. No-panics refactor - Device/Router handles return Result or Option instead of panicking on removed nodes; spawn() returns Result<JoinHandle>; with_device/with_router return Option<R>; all test call sites updated
  8. Region index overflow - region_base(idx) uses checked_mul(16).expect() instead of unchecked arithmetic
  9. Fix doc typo - removed (aka LinkCondition) redundancy from lab.rs module doc
  10. Consolidate test helpers - removed probe_udp_from, spawn_tcp_echo_in, sync udp_send_recv_count; all callers migrated to test_utils equivalents
  11. Remove dead region code - deleted unused allocators and fields
  12. Combine consecutive nl_run blocks - merged v4 + v6 return-route calls in setup_router_async
  13. Replace parse().unwrap() with direct construction - net4(), net6(), region_base() helpers
  14. RouterBuilder::error helper - deduplicated 15-field struct literal in error paths
  15. Real PMTU blackhole test - verifies MTU + block_icmp_frag_needed drops oversized packets
  16. spawn_command_async - added on Device, Router, and Ix
  17. Unify NAT API - removed RouterBuilder::nat_config(); users pass Nat::Custom(cfg)
  18. Remove deprecated aliases - removed NatMode, switch_route, set_impair, etc.
  19. API rename Impair -> LinkCondition - enum, fields, methods, and presets all renamed
  20. Suppress stderr on tc commands - stderr captured via Stdio::piped() + .output()
  21. ensure_root_ns race condition - eliminated by making Lab::new() async
  22. DeviceIface::ip() returns Option - v6-only devices return None
  23. ObservedAddr wrapper - converted to pub type ObservedAddr = SocketAddr
  24. DNS overlay at creation time - create_netns(name, dns_overlay) instead of set-after-create
  25. Merge ns creation + async worker - single Worker::spawn() saves 1 thread per namespace
  26. Extract apply_mount_overlay() - shared by async/sync workers, user threads, blocking pool
  27. Remove redundant nft flush ruleset - fresh namespaces have no rules
  28. Dead replace_default_route_v6 - removed unused method
  29. Duplicate docstring in apply_nat_for_router - removed
  30. DnsEntries::new() panics - NetworkCore::new() now returns Result
  31. Impair::Wifi/Mobile doc inaccuracy - corrected to match actual values
  32. eprintln! in apply_impair_in - replaced with tracing::warn!
  33. saturating_add on allocators - all 7 allocators use checked_add + bail!
  34. Remove unnecessary cleanup, simplify Netlink - fd-only namespaces, removed ResourceList and hooks
  35. Drop NETSIM_NS_* from env_vars - callers use handle methods instead
  36. resources() -> ResourceList::global() - associated function instead of free fn
  37. Core internalization - NetworkCore and helpers made pub(crate)
  38. nl_run closure noise - RouterSetupData/IfaceBuild derive Clone
  39. Migrate tests to handle API - ~200+ call sites migrated, dead Lab methods removed
  40. Remove repetitive/legacy patterns - debug test removed, helpers consolidated
  41. Variable clones before nl_run - structurally required, accepted as-is
  42. RouterData::wan_ifname() helper - deduplicates wan interface name logic
  43. RouterBuilder - builder pattern matching DeviceBuilder
  44. Core fns simplified to async fn - set_link_state/replace_default_route use netlink directly
  45. Persistent Netlink per namespace - created once per AsyncWorker
  46. Test suite debugging - fixed 5+ failing tests across nat/link/region tests
  47. Async namespace worker redesign - two workers per namespace with TaskHandle<T>
  48. Lab::init_tracing() cleanup - replaced with patchbay_utils::init_tracing()
  49. Dead iperf UI table - removed from UI
  50. Duplicate spawn_reflector_in - consolidated into test_utils.rs
  51. Remove RouterId type aliases - all code uses NodeId
  52. Bridge/namespace naming - moved into NetworkCore
  53. Split lib.rs monolith - into lab.rs + config.rs
  54. shared_cache_root removal - callers pass cache_dir explicitly
  55. url_cache_key allocations - replaced with write! buffer
  56. StepTemplateDef expansion - not applied, already correct
  57. SimFile/LabConfig duplication - #[serde(flatten)] applied
  58. stage_build_binary dedup - not applied, paths diverge
  59. write_progress/write_run_manifest twins - extracted write_json helper
  60. CaptureStore accessor pattern - private fn lock() helper added
  61. artifact_name_kind allocations - returns (&str, bool) now
  62. Multi-pass router resolution - accepted as-is for current topology sizes
  63. VmBinarySpec duplication - unified via shared crate dependency