ml_inference

Vortex training pipeline — knowledge distillation for monocular depth + 6-class semantic segmentation, designed for ROS 2 / Nav2 deployment on a Jetson Orin Nano.

Six-panel video showing V9 deployment on the corridor evaluation set: RGB, raw ToF, DA3 reference, V9 inference, ToF+DA3 fusion, ToF+V9 fusion

V9 (Lighthouse) production deployment on the 459-frame corridor evaluation set. Top row: RGB input · raw Femto Bolt ToF depth (showing the 79.7 % dead pixel rate that motivates the fusion) · zero-shot DA3-Small reference depth. Bottom row: V9 raw inference · ToF + DA3 fusion (foundation-model baseline) · ToF + V9 fusion (production deployment realization, consumed by the local costmap). Verified end-to-end inside the Docker container — see Docker Reproducibility.

Overview

This repository is the off-board half of a bootstrap-perception system for indoor robot navigation under hardware depth failure. It contains the training pipeline that produces compact monocular depth and 6-class semantic segmentation student models distilled from large foundation teachers, the offline evaluation scaffold that measures their accuracy, and the export tooling that converts trained checkpoints to TensorRT engines for on-vehicle deployment.

The system addresses a specific operational problem. The deployment camera (Orbbec Femto Bolt Time-of-Flight) returns valid depth on approximately 22 % of pixels in the target environment; the remaining 78 % are lost to reflective surfaces (polished floors, glass walls, mid-field returns beyond sensor range). The bootstrap-perception strategy uses the surviving valid pixels to anchor a learned monocular depth prediction to metric scale, then fuses the two signals per pixel to produce a dense depth signal that the local costmap can consume. The position this work supports:

Monocular depth estimation cannot replace structured-light or Time-of-Flight depth in indoor navigation, but it is an effective fusion partner. Fusing LiDAR with confidence-gated learned depth recovers approximately 55 % more occupied costmap cells in narrow corridors than LiDAR alone, and produces dense geometry where the hardware sensor returns invalid pixels.

The runtime ROS 2 nodes (Depth Fusion, Class Costmap, Student TRT, YOLO TRT), the Nav2 configuration, and the live deployment harness on the Traxxas Maxx 4S testbed live in the sibling repository NCHSB. The two repositories are deliberately decoupled along the off-board / on-vehicle boundary; their dependency stacks, release cadences, and audiences differ.

Component	This repository (`ml_inference`)	Sibling repository (`NCHSB`)
Teacher inference on HPC	DA3-Metric-Large + YOLOv8 + SAM2-Large	—
Student training (V1 → V9 lineage)	All 9 configurations	—
Offline evaluation pipelines	Per-pixel depth, costmap ablation, calibration sensitivity	—
Export toolchain	ONNX → TensorRT FP16 / INT8, Jetson micro-benchmarks	—
Runtime ROS 2 nodes	—	Depth Fusion, Class Costmap, Student TRT, YOLO TRT
Navigation configuration	—	`nav2_params_rc.yaml`, controllers, EKF, launch files
Deployment platform	—	Live integration on Traxxas Maxx 4S
Simulation harness	—	Gazebo Fortress worlds, rosbag replay

Project highlights

Bootstrap perception architecture. The Orbbec Femto Bolt Time-of-Flight sensor returns valid depth on 22.21 % of pixels in the corridor evaluation set; the surviving pixels anchor a learned monocular prediction to metric scale via per-frame median alignment. The runtime fusion (per-pixel confidence-gated substitution) preserves hardware geometry where available and substitutes the calibrated learned signal where the sensor failed.
Corridor specialist with closed-loop validation. The V9 student (Lighthouse) achieves 0.382 m LILocBench corridor RMSE and matches ground-truth-depth navigation performance in Gazebo Fortress closed-loop trials (9 / 10 success at 10 seeds, 0 collisions, time-to-goal within 0.22 s of the ground-truth baseline).
Foundation-model deployment headline. DA3-Small zero-shot inference benchmarks at 218 FPS / 4.6 ms / 2.7 GB on the Jetson Orin Nano (TensorRT FP16, 308 × 308 input). The V-series students complement the foundation model as corridor-domain specialists rather than replacing it.
Specification-aligned training target. The hybrid depth supervision in models/losses.py:HybridDepthLoss selects DA3 teacher depth where available and falls back to ToF measurement otherwise. The frame-level training rule and the per-pixel deployment fusion implement the same supervision principle (prefer hardware ground truth where available; fall back to the learned signal where not) at the granularity appropriate to each stage.
Auditable results. Every reported number traces to a JSON file under results/. Quantitative claims are reproducible from the offline evaluation pipeline (see Quick start below). The deferred APE / SLAM evaluation is documented in § Limitations and disclosures.

Architecture

Two halves of one system. The dashed boundary is where this repo ends and NCHSB begins.

Off-board (this repo, on NYU HPC):

teacher_infer/run_da3.py produces metric-depth supervision per frame from DA3-Metric-Large (metric = focal · raw / 300).
teacher_infer/run_sam2.py produces 6-class segmentation labels by combining YOLOv8 detections (person, furniture), SAM2-Large mask refinement, and geometric heuristics for floor / wall / glass.
teacher_infer/build_manifest.py emits a manifest.jsonl linking every RGB to its DA3 depth, SAM2 seg, ToF depth, and ToF confidence.
train.py trains an EfficientViT-B1 student (5.31 M params) with a hybrid depth loss + cross-entropy seg + edge-aware smoothness, optionally Kendall-uncertainty-weighted.
export_trt.py exports ONNX → TensorRT FP16 / INT8 engines for the Jetson.

On-board (NCHSB, on the Jetson): Student TRT Node consumes RGB and publishes /student/depth + /student/segmentation. Depth Fusion Node combines the student output with the surviving ToF pixels and publishes /perception/fused_depth. Point Cloud XYZ Node back-projects that into /perception/fused_depth_points, which Nav2's local costmap consumes.

The student lineage (V1 → V9)

The repository's main reflects the V4-V9 production codebase. The earlier V1-V3 baseline (MobileNetV3-Small + DA2-Large) is preserved verbatim under archive/v1-v3-baseline/ and reproducible exactly via the v1-v3-baseline git tag.

Corridor RMSE is reported on two different sets: LILocBench (Bonn corridor benchmark, used for fine-tuning V7 and V9) and Femto Bolt (our own indoor corridor recordings, used as the deployment-truth set). They are not interchangeable — LILocBench is shorter and structurally simpler.

Version	Codename	Backbone	Teacher	NYU RMSE	LILocBench RMSE	Femto Bolt RMSE	Configuration change	Outcome
V1	Compass	MobileNetV3-Small	DA2-Large	75.37 m	—	—	Initial distillation	Baseline. Unit-space mismatch (DA2 outputs relative depth) dominates the result.
V2	Sextant	MobileNetV3-Small	DA2-Large	—	—	—	Kendall log-σ² clamp experiments	Diagnostic; ruled out loss weighting as the dominant cause of V1's result.
V3	Anchor	MobileNetV3-Small	DA3-Large	1.160 m	—	—	berHu loss + Kendall weighting + two-rate optimizer	First metric-scale predictions. Recipe set at V3 and held constant through V9.
V4	Pivot	EfficientViT-B1	DA3-Large	0.774 m	—	1.373 m	Encoder substitution	−33 % NYU RMSE at fixed recipe. Architecture inherited by V5–V9.
V5	Atlas	EfficientViT-B1	DA3-Large	0.572 m	—	2.186 m	Deployment-targeted augmentation pipeline	Production: general indoor. Largest single-step NYU improvement (−26 %).
V6	Cornerstone	EfficientViT-B1	DA3-Large	0.519 m	—	2.158 m	Multi-domain pretraining (SUN+DIODE) → NYU fine-tune	Production: fine-tuning base. Best NYU result in lineage.
V7	Tunnel	EfficientViT-B1	DA3-Large	1.315 m	0.445 m	1.982 m	V5 → LILocBench fine-tune	First corridor specialization. NYU regression characteristic of single-domain fine-tuning.
V8	Confluence	EfficientViT-B1	DA3-Large	0.592 m	—	2.266 m	Joint NYU + LILocBench training from V5	Joint-domain training does not improve corridor performance over the V5 baseline (Pareto-dominated by V9).
V9	Lighthouse	EfficientViT-B1	DA3-Large	1.553 m	0.382 m	1.589 m	V6 → LILocBench fine-tune	Production: corridor specialist. Closed-loop validated against ground-truth depth in simulation.

Recipe details and per-version provenance are on the model lineage page of the project blog. Each version (V1 through V9) has a dedicated page documenting the configuration change, the experimental outcome, and the design rationale.

Selection guidance

Use case	Recommended checkpoint
General-purpose indoor depth estimation	V5 (Atlas) — `NYU 0.572 m`, balanced segmentation mIoU 63.7 %
Fine-tuning base for additional domain specialists	V6 (Cornerstone) — `NYU 0.519 m`, best NYU result; recommended initialization
Production corridor specialist (closed-loop validated)	V9 (Lighthouse) — `LILocBench 0.382 m`, 9 / 10 Gazebo success matching ground-truth depth
Maximum-throughput foundation-model inference on Jetson	DA3-Small zero-shot (218 FPS / 4.6 ms / 2.7 GB) — production runtime; V-series students complement it as domain specialists

Results

Numbers below are reproduced from paper_stats.json and the per-experiment JSON files in results/. Every row has an n and a 95 % CI in the source files; only the headline is shown here.

DA3-Small zero-shot on NYU val (Table VII, teacher row)

Metric	Value	n
RMSE (m)	0.513 ± 0.038	290
AbsRel	0.124 ± 0.008	290
δ < 1.25 (%)	85.2 ± 1.6	290
δ < 1.25² (%)	95.3 ± 0.7	290
Latency (ms, PyTorch)	65.3	—

Source: results/nyu_da3_da3-small_val.json. The 218 FPS / 4.6 ms / 2.7 GB headline is the same model under TensorRT FP16 at 308 × 308 on Jetson Orin Nano (separate Jetson benchmark; not from this PyTorch run).

Pixel-level depth on the corridor (459 frames)

Method	RMSE (overall)	RMSE (near 0-1.5 m)	RMSE (mid 1.5-3 m)	RMSE (far 3-6 m)	δ < 1.25 (%)
Sensor (ToF)	0.000	0.000	0.000	0.000	100
DA3-Small	0.522	0.145	0.503	1.305	53.4
V9 student	1.418	1.642	1.461	1.012	17.2

Sensor RMSE is zero by construction — sensor pixels are the "ground truth" against which everything else is scored on the surviving 22.21 % valid mask. Source: results/pixel_fusion.json.

DA3-Small dominates V9 on this metric. V9 wins the corridor specialist benchmark on LILocBench (above) but is not the right pick for general per-pixel accuracy.

Costmap recovery (Table III, n = 459 frames)

Config	IoU	Detection rate (%)	FPR (%)	Inflation radius (m)	Timing (ms)
Baseline	1.000	100.0	0.0	0.090	16.5
A1 (depth only)	1.000	100.0	0.0	0.177	63.3
A3 (L+D, fixed inflation)	0.379	100.0	5.2	0.177	133.8
A4 (L+D, adaptive)	0.279	76.7	5.2	0.165	132.6
A5 (L+D, large inflation)	0.379	100.0	5.2	0.192	206.2
A6 (L+D, conservative)	0.279	76.7	5.2	0.197	189.9

Source: results/costmap_ablation/corridor/summary.json. Headline result: L → L+D adds +55 % occupied cells in narrow corridors (2 295 → 3 546 mean occupied; paper_stats.json:table_iv).

LILocBench dynamics_0: 10 pedestrians moving through the scene. L+D recovers pedestrian bodies that L misses entirely.

Live Nav2 costmap during corridor replay. Green = pixels filled by DA3, blue = pixels filled by V9, white = surviving ToF.

Limitations and disclosures

The 5.2 % false-positive rate measured in the L + D configuration is a real operational cost of fusion. Decomposition (results/fpr_audit.json):

49.3 % model hallucinations (depth model predicts an obstacle where none exists)
34.6 % sensor-invalid fill (depth model assigns obstacle status to a pixel where ToF would have reported invalid)
18.1 % costmap inflation artifacts (geometry correct, inflation radius too aggressive)

Additional disclosures relevant to reproducing or quoting reported numbers:

APE / SLAM evaluation deferred to future work. A preliminary measurement reported a 73 % APE improvement (1.23 m vs 4.63 m) but used asymmetric rosbag playback rates between configurations (1.0× for LiDAR-only, 0.5× for fused-depth to allow inference completion). Slower playback runs SLAM Toolbox loop closure more aggressively, which reduces APE independently of the depth-fusion contribution. The preliminary numbers are retained in paper_stats.json:table_vi for traceability but are not cited as a reported result. Matched-playback re-evaluation on the deployment hardware is identified as future work.
INT8 calibration in export_trt.py defaults to random tensors. The plumbing for INT8 quantization is implemented and functional; the accuracy of the resulting INT8 engine without calibration data is not validated. For deployment INT8, supply --calib-images <dir> pointing at representative corridor frames. All reported Jetson runtime numbers (218 FPS / 4.6 ms / 2.7 GB on DA3-Small) use FP16, not INT8.
benchmark_jetson.py reports depth RMSE only. The segmentation mIoU column is initialized but not populated. Latency and depth-RMSE measurements from this script are valid; mIoU values from this script should not be used.
V9 is specialized for corridor environments, not a general-purpose model. NYU val RMSE (1.553 m) substantially exceeds V5 (0.572 m) and V6 (0.519 m) — a documented consequence of single-domain fine-tuning under standard catastrophic-forgetting dynamics. The tradeoff is intentional: V9 is the recommended checkpoint when the deployment domain is restricted to corridors; for general indoor scenes, V5 or V6 is preferred.

Quick start

Local CPU validation (laptop)

Validates the full pipeline end-to-end on a tiny subset before pushing to HPC. Downloads ~2.8 GB of NYU Depth V2 the first time.

conda create -n vortex_ml python=3.10 -y && conda activate vortex_ml
pip install -r requirements.txt

# Smoke test — 2 epochs, batch 4, 50 frames, CPU
python train.py --epochs 2 --batch-size 4 --device cpu --data-limit 50

# ONNX-only export (no TRT on laptop)
python export_trt.py --checkpoint checkpoints/best.pt --skip-trt

HPC training (NYU Torch)

ssh <NetID>@login.torch.hpc.nyu.edu
cd $HOME && git clone https://github.com/Nishant-ZFYII/ml_inference.git ml_pipeline
bash ml_pipeline/setup_hpc.sh    # creates $SCRATCH/conda_envs/nchsb_ml

# Verify partitions for your account
sinfo
# Edit train.slurm + teacher_infer/teacher_infer.slurm if --partition or --gres differ

# Teacher inference on NYU val
sbatch ml_pipeline/teacher_infer/teacher_infer.slurm

# Train V4-V9-style student
sbatch ml_pipeline/train.slurm

# Distillation eval (Table IV equivalent)
python eval_distillation.py \
    --checkpoint $SCRATCH/checkpoints/best.pt \
    --manifest   $SCRATCH/nyu_teacher_data/manifest.jsonl

Corridor specialisation (V7 / V9)

Once you have V5 or V6 weights, fine-tune on LILocBench:

# 1. Local: extract corridor frames from rosbag (Linux box where the bag lives)
python -m teacher_infer.extract_corridor_bag \
    --bag /home/<you>/rosbags/<your_corridor_bag>.mcap \
    --output-dir corridor_eval_data --subsample 5

# 2. tar + scp corridor_eval_data/ to $SCRATCH on HPC

# 3. Re-run teachers, build manifest, fine-tune
sbatch ml_pipeline/eval_corridor.slurm

Jetson deployment

# Build engines on the Jetson (or any TRT-capable host)
python export_trt.py --checkpoint best.pt              # FP32 + FP16 + INT8

# Latency / GPU-mem / depth-RMSE micro-benchmark
python benchmark_jetson.py --engine-dir exported/

The engine then plugs into Student TRT Node in NCHSB.

Repository layout

ml_inference/
├── README.md                       ← this file
├── config.py                       ← central Config dataclass
├── requirements.txt                ← Python deps
├── setup_hpc.sh                    ← one-time HPC env setup
│
├── train.py                        ← student training loop
├── train.slurm                     ← SLURM job for default training
├── train_iter6.slurm               ← Kendall uncertainty + per-task ckpts
├── train_iter7.slurm               ← TUM RGB-D experiment
├── train_iter7b_b2.slurm           ← EfficientViT-B2 ablation
│
├── eval_distillation.py            ← student vs teacher (RMSE / AbsRel / δ / mIoU)
├── eval_corridor_da3.py            ← DA3-Small zero-shot on the corridor
├── eval_corridor_depth.py          ← Student depth on the corridor
├── eval_corridor_v4.slurm          ← SLURM for V4-era corridor eval
├── eval_corridor.slurm             ← SLURM for B1/B2 corridor eval
├── eval_nyu_da3.{py,slurm}         ← DA3-Small zero-shot on NYU val
├── eval_nearrange_safety.py        ← Near-range (0-1.5 m) safety analysis
├── fpr_audit.py                    ← FPR origin classification
├── temporal_consistency.py         ← Costmap stability across frames
├── compute_paper_stats.py          ← Aggregates per-experiment JSONs into paper_stats.json
│
├── calibration_sensitivity.py      ← Reviewer-response calibration ablation
├── costmap_builder.py              ← Costmap construction for ablation
├── inflation.py                    ← Inflation logic
├── run_costmap_ablation.py         ← Full costmap ablation harness
├── extract_lilocbench.py           ← LILocBench frame extraction
├── corridor_sam2_seg.slurm         ← SAM2 corridor seg labels
├── costmap_ablation.slurm          ← SLURM for full ablation
│
├── export_trt.py                   ← ONNX + TensorRT FP32/FP16/INT8
├── benchmark_jetson.py             ← TRT engine micro-benchmark
├── print_model_shapes.py           ← Encoder feature-map verification utility
│
├── generate_paper_figures.py       ← Figures from rosbag + checkpoints
├── generate_demo_videos.py         ← Individual model comparison videos (1280×720)
├── generate_grid_video.py          ← Synchronized 2×3 / 2×4 grid comparison videos
├── generate_corridor_missing.py    ← OOM-safe sequential model inference for corridor
├── create_full_comparison.py       ← Side-by-side comparison panels
├── create_paper_fig2.py            ← Fig. 2 generator
├── extract_bag_frames.py           ← Frame extraction from rosbag2 (.db3)
├── extract_corridor_frames.py      ← Corridor-specific extraction
├── extract_glass_corridor.py       ← Glass-corridor scene extraction
├── find_worst_frame{,_simple}.py   ← Worst-case-frame finders
├── run_da3_on_frames.py            ← DA3 inference on raw frames
├── run_depth_comparison.py         ← Per-frame depth comparison
├── run_student_evaluation.py       ← Aggregate student eval
├── da3_glass_corridor.py           ← DA3 on glass corridor scene
│
├── pipeline_lilocbench.slurm       ← End-to-end LILocBench pipeline
│
├── dataset/
│   ├── nyu_loader.py               ← NYU Depth V2 (HuggingFace datasets, pinned <4)
│   ├── corridor_loader.py          ← Corridor data loader
│   ├── lilocbench_loader.py        ← Bonn LILocBench loader
│   ├── tum_loader.py               ← TUM RGB-D loader
│   └── label_remapper.py           ← 894 → 40 → 6 class remapping
│
├── models/
│   ├── student.py                  ← EfficientViT-B1 backbone-agnostic + dual decoders
│   └── losses.py                   ← Hybrid depth (ToF/DA3) + CE seg + edge smoothness
│
├── teacher_infer/
│   ├── run_da3.py                  ← DA3-Metric-Large depth teacher
│   ├── run_sam2.py                 ← YOLO-seeded SAM2-Large + geometric labeler
│   ├── verify_teacher_output.py    ← Pre-run scale/sanity gate
│   ├── build_manifest.py           ← Emits manifest.jsonl
│   ├── extract_corridor_bag.py     ← Local bag → frame folder + manifest
│   ├── prep_tum.py                 ← TUM RGB-D preparation
│   ├── teacher_infer.slurm         ← SLURM for NYU teachers
│   └── teacher_infer_tum.slurm     ← SLURM for TUM teachers
│
├── results/                        ← Versioned evaluation outputs
│   ├── paper_stats.json            ← Aggregated Tables III–VI
│   ├── nyu_da3_da3-small_val.json  ← DA3-Small NYU eval
│   ├── pixel_fusion.json           ← Per-frame fusion comparison
│   ├── nearrange_safety.json       ← Near-range RMSE breakdown
│   ├── fpr_audit.json              ← FPR origin decomposition
│   ├── temporal_consistency.json   ← Frame-to-frame stability
│   └── costmap_ablation/           ← Per-config inflation radii + per-frame metrics
│
├── Dockerfile                      ← ML inference / evaluation container
├── docker-compose.yml              ← Multi-service reproducibility harness
│
├── docs/                           ← Jekyll site (nishant-zfyii.github.io/ml_inference)
│   ├── _config.yml                 ← Jekyll config
│   ├── index.md                    ← Project overview
│   ├── training.md                 ← V1 → V9 training lineage
│   ├── evaluation.md               ← Depth metrics, costmap ablation
│   ├── calibration.md              ← Reviewer-requested calibration study
│   ├── deployment.md               ← ONNX / TRT export, Jetson benchmarks
│   ├── datasets.md                 ← Data inventory and provenance
│   ├── videos.md                   ← Demo video generation pipeline
│   └── docker.md                   ← Docker usage guide
│
├── archive/
│   ├── README.md                   ← Archive index
│   └── v1-v3-baseline/             ← Frozen V1-V3 codebase (MobileNetV3 + DA2)
│
└── assets/                         ← Figures referenced by this README

Docker reproducibility

The fastest path from clone to results. Verified end-to-end on 2026-05-10 — docker build, smoke-test, and eval-corridor (459 frames, V9, CPU) all pass. The full verification log lives in docs/docker.md.

Model weights and evaluation data are volume-mounted, not baked into the image. Image is ~6.3 GB (PyTorch wheel dominates).

# Build — pass --network=host on networks with restricted DNS (e.g. NYU campus)
docker build --network=host -t ml-inference .

# Smoke test (no data needed)
docker compose run --rm smoke-test
# → Model forward pass OK: depth (1, 1, 240, 320), seg (1, 6, 240, 320)

# Corridor depth evaluation (~3 min CPU on 459 frames)
docker compose run --rm eval-corridor
# → RMSE 1.366 m, sensor dead-pixel rate 79.7%

# Calibration sensitivity experiment
docker compose run --rm calibration

# Demo videos and grid comparisons
docker compose run --rm demo-videos
docker compose run --rm grid-videos

Service	What it runs	GPU needed
`smoke-test`	Model architecture + forward pass check	No
`eval-corridor`	V9 depth on 459 corridor frames	No (CPU), faster with GPU
`calibration`	Calibration sensitivity sweep (N = 1–100 frames)	No
`demo-videos`	Individual model comparison videos (1280×720, XVID)	No
`demo-videos-gpu`	Same, GPU-accelerated inference	Yes
`grid-videos`	Synchronized 2×3 / 2×4 grid comparison videos	No

For GPU passthrough: docker compose run demo-videos-gpu (requires NVIDIA Container Toolkit).

Engineering blog

The project blog walks through each component in long-form: the model lineage with one page per training configuration (V1 through V9, what changed and why), the architecture diagrams (training, student, runtime fusion as Mermaid), the costmap ablation, the calibration sensitivity study, deployment notes for the Jetson runtime, the demo video generation pipeline, and a decisions log documenting the relationship between the formal specification and the deployment realization. The README is the reference card; the blog is the technical journal.

Reproducibility notes

Bag location. The corridor rosbag (rgbd_imu_20260228_003828_0.mcap, ~8.1 GB) lives on the local Linux extraction host, not on HPC. Frame extraction (teacher_infer/extract_corridor_bag.py) runs locally; the resulting corridor_eval_data/ directory is tar/scp'd to $SCRATCH/corridor_eval_data/ on HPC where the SLURM jobs read it. See eval_corridor.slurm:21-29 for the exact handoff.

HPC environment.

Setting	Value
Cluster	NYU Torch HPC (`login.torch.hpc.nyu.edu`)
Partition	`l40s_public` (default; verify with `sinfo`)
GPU	`gpu:l40s:1`
Account	`torch_pr_742_general`
Module	`anaconda3/2025.06`
Conda env	`$SCRATCH/conda_envs/nchsb_ml` (created by `setup_hpc.sh`)

Pinned dataset library. The NYU Depth V2 HF dataset still uses a loading script, which means datasets >= 4.0 will refuse to load it. requirements.txt pins datasets >= 2.14, < 4.0.

Recovering the V1-V3 codebase. Two paths, both reproducible:

# As browsable files at the top of main:
ls archive/v1-v3-baseline/

# As a complete checkout of the V1-V3 working tree:
git checkout v1-v3-baseline

Acknowledgements

Compute: NYU HPC Torch cluster (torch_pr_742_general).
Foundation models used as teachers and runtime: Depth Anything V3 (DA3-Metric-Large + DA3-Small), SAM2-Large, YOLOv8.
Datasets: NYU Depth V2 (Silberman et al.), SUN RGB-D, DIODE, LILocBench, TUM RGB-D.
Sibling repository: ROS 2 runtime nodes, Nav2 configuration, and Gazebo simulation harness live in NCHSB.

Author attribution is omitted from this README during the active review window. Full author and contributor information will be restored after the review process completes.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml_inference

Overview

Project highlights

Architecture

The student lineage (V1 → V9)

Selection guidance

Results

DA3-Small zero-shot on NYU val (Table VII, teacher row)

Pixel-level depth on the corridor (459 frames)

Costmap recovery (Table III, n = 459 frames)

Limitations and disclosures

Quick start

Local CPU validation (laptop)

HPC training (NYU Torch)

Corridor specialisation (V7 / V9)

Jetson deployment

Repository layout

Docker reproducibility

Engineering blog

Reproducibility notes

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
archive		archive
assets		assets
dataset		dataset
docs		docs
huggingface_cards		huggingface_cards
models		models
results		results
teacher_infer		teacher_infer
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
analyze_ablation.py		analyze_ablation.py
benchmark_jetson.py		benchmark_jetson.py
calibration_sensitivity.py		calibration_sensitivity.py
compute_paper_stats.py		compute_paper_stats.py
config.py		config.py
corridor_sam2_seg.slurm		corridor_sam2_seg.slurm
costmap_ablation.slurm		costmap_ablation.slurm
costmap_builder.py		costmap_builder.py
create_full_comparison.py		create_full_comparison.py
create_paper_fig2.py		create_paper_fig2.py
da3_glass_corridor.py		da3_glass_corridor.py
docker-compose.yml		docker-compose.yml
eval_corridor.slurm		eval_corridor.slurm
eval_corridor_da3.py		eval_corridor_da3.py
eval_corridor_da3.slurm		eval_corridor_da3.slurm
eval_corridor_depth.py		eval_corridor_depth.py
eval_corridor_v4.slurm		eval_corridor_v4.slurm
eval_distillation.py		eval_distillation.py
eval_nearrange_safety.py		eval_nearrange_safety.py
eval_nyu_da3.py		eval_nyu_da3.py
eval_nyu_da3.slurm		eval_nyu_da3.slurm
export_trt.py		export_trt.py
extract_bag_frames.py		extract_bag_frames.py
extract_corridor_frames.py		extract_corridor_frames.py
extract_glass_corridor.py		extract_glass_corridor.py
extract_lilocbench.py		extract_lilocbench.py
find_worst_frame.py		find_worst_frame.py
find_worst_frame_simple.py		find_worst_frame_simple.py
fpr_audit.py		fpr_audit.py
generate_corridor_missing.py		generate_corridor_missing.py
generate_demo_videos.py		generate_demo_videos.py
generate_grid_video.py		generate_grid_video.py
generate_paper_figures.py		generate_paper_figures.py
inflation.py		inflation.py
pipeline_lilocbench.slurm		pipeline_lilocbench.slurm
print_model_shapes.py		print_model_shapes.py
regenerate_da3_depth.py		regenerate_da3_depth.py
requirements.txt		requirements.txt
run_costmap_ablation.py		run_costmap_ablation.py
run_da3_on_frames.py		run_da3_on_frames.py
run_depth_comparison.py		run_depth_comparison.py
run_student_evaluation.py		run_student_evaluation.py
setup_hpc.sh		setup_hpc.sh
step1_download_lilocbench.sh		step1_download_lilocbench.sh
temporal_consistency.py		temporal_consistency.py
train.py		train.py
train.slurm		train.slurm
train_iter6.slurm		train_iter6.slurm
train_iter7.slurm		train_iter7.slurm
train_iter7b_b2.slurm		train_iter7b_b2.slurm

Folders and files

Latest commit

History

Repository files navigation

ml_inference

Overview

Project highlights

Architecture

The student lineage (V1 → V9)

Selection guidance

Results

DA3-Small zero-shot on NYU val (Table VII, teacher row)

Pixel-level depth on the corridor (459 frames)

Costmap recovery (Table III, n = 459 frames)

Limitations and disclosures

Quick start

Local CPU validation (laptop)

HPC training (NYU Torch)

Corridor specialisation (V7 / V9)

Jetson deployment

Repository layout

Docker reproducibility

Engineering blog

Reproducibility notes

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages