[ICLR'26] Loc²: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching

[Arxiv][BibTeX]

📝 Abstract

We propose an accurate and interpretable fine-grained cross-view localization method that estimates the 3 Degrees of Freedom (DoF) pose of a ground-level image by matching its local features with a reference aerial image. Unlike prior approaches that rely on global descriptors or bird's-eye-view (BEV) transformations, our method directly learns ground-aerial image-plane correspondences using weak supervision from camera poses. The matched ground points are lifted into BEV space with monocular depth predictions, and scale-aware Procrustes alignment is then applied to estimate camera rotation, translation, and optionally the scale between relative depth and the aerial metric space. This formulation is lightweight, end-to-end trainable, and requires no pixel-level annotations. Experiments show state-of-the-art accuracy in challenging scenarios such as cross-area testing and unknown orientation. Furthermore, our method offers strong interpretability: correspondence quality directly reflects localization accuracy and enables outlier rejection via RANSAC, while overlaying the re-scaled ground layout on the aerial image provides an intuitive visual cue of localization performance.

📦 Checkpoints

📁 Download pretrained models

🗂️ Data Preparation

VIGOR

Please download and prepare the VIGOR dataset by following the instructions in the official repository.

KITTI

Please download and organize the KITTI dataset according to the directory structure used in HighlyAccurate.

📊 Evaluation

Run all commands from the repository root.

VIGOR

python eval_vigor.py --area <samearea|crossarea> --random_orientation <0|180> --ransac True --model_path <checkpoint>

Checkpoints:

checkpoints/vigor/samearea/known_ori/model.pt
checkpoints/vigor/samearea/unknown_ori/model.pt
checkpoints/vigor/crossarea/known_ori/model.pt
checkpoints/vigor/crossarea/unknown_ori/model.pt

KITTI

python eval_kitti.py --rotation_range <10|180> --max_depth 40 --model_path <checkpoint>

Checkpoints:

checkpoints/kitti/ori_noise10/model.pt
checkpoints/kitti/ori_noise180/model.pt

Output

Evaluation results are saved to results/ by default. Use --results_dir /path/to/output_dir to override it.

🚀 Training

VIGOR

python train_vigor.py --area <samearea|crossarea> --random_orientation <0|180>

Optional arguments:

--batch_size (default: 80)
--learning_rate (default: 1e-4)
--max_depth (default: 35)
--beta (default: 1.0)
--loss_grid_size (default: 5.0)
--temperature (default: 0.1)
--epoch_to_resume to resume training

Training checkpoints are saved to ../checkpoints/ and metrics are saved to ../results/.

✅ To-Do

Citation

@inproceedings{xia2026loc,
  title={{Loc}$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching},
  author={Xia, Zimin and Xu, Chenghao and Alahi, Alexandre},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DINO_modules		DINO_modules
KITTI_splits		KITTI_splits
att_layers		att_layers
dataloaders		dataloaders
models		models
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
eval_kitti.py		eval_kitti.py
eval_vigor.py		eval_vigor.py
overview.png		overview.png
train_vigor.py		train_vigor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR'26] Loc²: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching

📝 Abstract

📦 Checkpoints

🗂️ Data Preparation

VIGOR

KITTI

📊 Evaluation

VIGOR

KITTI

Output

🚀 Training

VIGOR

✅ To-Do

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICLR'26] Loc2: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching

📝 Abstract

📦 Checkpoints

🗂️ Data Preparation

VIGOR

KITTI

📊 Evaluation

VIGOR

KITTI

Output

🚀 Training

VIGOR

✅ To-Do

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

[ICLR'26] Loc²: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching

Packages