VLAM-based navigation for mining vehicles in Carla UE5.5 simulator

Demo of using Vision Language Action Models for autonomous navigation between waypoints using simple road signs. Running the Mine01 offroad mining world in the new Carla UE5.5 simulator. The user can give an instruction in natural language like:

"Drive the vehicle to Mine B."

The VLAM receives an onboard image at each waypoint and decides the direction to drive, determines if the destination has been reached and provides its reasoning. A waypoint handler translates this information into the next waypoint for a Carla agent, where the VLAM kicks in again.

Demo video

Demo images with VLAM output

Models

This package can be used either with a cloud VLM via the Google AI API or a locally hosted model using a vLLM OpenAI endpoint. It implements structured JSON outputs for both backends to ensure correctly formatted responses. Tests with Google's gemini-2.0-flash-exp show very reliable results, while smaller VLMs like Pixtral-12B from Mistral AI worked most of the time.

Installation

All tests have been done with Carla UE5.5 0.10.0 using Ubuntu 22.04 and Nvidia GPUs. Because the road signs are added to the simulator, Carla has to be built from source with Unreal Editor. Follow the steps described here to build Carla UE5.5 on a Linux machine. Then add the signs from carla/objects/ to the Mine01 simulation environment in Unreal Editor. The Carla UE5.5 build should also install the carla Python package in the same version. Check if everything is working and continue with installing this package:

git clone https://github.com/leon-seidel/vlam-drive.git
cd vlam-drive
pip install -e .

Configuration

First choose whether to use the Google AI API or a self-hosted vLLM model as a backend. In case of the Google API, the GOOGLE_API_KEY has to be acquired from here. You can also choose one of the available Google models in GOOGLE_MODEL_NAME while all testing has been done with gemini-2.0-flash-exp.

When using vLLM, the VLLM_BASE_URL might have to be configured when running on another machine or port. Running Mistral AI's Pixtral-12B has been tested and works with the given setup. This FP8-quantized version of Pixtral runs on a single RTX 3090:

vllm serve neuralmagic/pixtral-12b-FP8-dynamic --max-model-len 16384

All settings for this package can be edited in a .env file or using environment variables:

# Vision Language Model Settings
VLAM_BACKEND=google                       # Options: google, vllm
GOOGLE_API_KEY=your_key_here              # Get from https://aistudio.google.com/app/apikey
GOOGLE_MODEL_NAME=gemini-2.0-flash-exp    # Google Gemini model version
VLLM_BASE_URL=http://localhost:8000/v1    # URL for local VLLM deployment

# Image Settings
SAVE_IMAGES=false                         # Save images with VLAM results
SHOW_IMAGES=true                          # Show images with VLAM results

Running

Start Carla UE5.5 with:

./CarlaUnreal.sh

You can then run the VLM drive navigation with a custom instruction in natural language referring to the 3 available destinations Mine A, Mine B and Mine C:

python run_vlam_drive.py --instruction "Please drive the vehicle to Mine A."

Happy driving!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
carla		carla
media		media
vlm_drive		vlm_drive
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
model_outputs		model_outputs
pyproject.toml		pyproject.toml
run_vlam_drive.py		run_vlam_drive.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLAM-based navigation for mining vehicles in Carla UE5.5 simulator

Demo images with VLAM output

Models

Installation

Configuration

Running

About

Releases

Packages

Languages

License

leon-seidel/vlam-drive

Folders and files

Latest commit

History

Repository files navigation

VLAM-based navigation for mining vehicles in Carla UE5.5 simulator

Demo images with VLAM output

Models

Installation

Configuration

Running

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages