lros-expe

Experiments and benchmarks for the LROS project

Hardware details

On the Orange Pi Ultra that we have, cores [0-3] are Cortex-A55 (in-order -> more efficient) while cores [4-7] are Cortex-A76 (O3 -> more performant)

Setup

Clone with git clone [url] --recursive or execute git submodule update --init --recursive to initialize the submodules

To get the models, run the scripts/download_models.sh script.

Build

vaccel meson >= 1.1,

cd vaccel
cd scripts/common; git apply ../../submodules.patch; cd ../..
meson setup --buildtype=release build
meson compile -C build
meson install -C build --destdir=out
sed -i "s/prefix=/prefix=\/home\/$(whoami)\/lros-expe\/vaccel\/build\/out/" /home/$(whoami)/lros-expe/vaccel/build/out/usr/local/lib/pkgconfig/vaccel.pc

lros-qemu Need: python3-tomli, libglib2.0-dev

cd lros-qemu
mkdir build
cd build
CFLAGS=-Wno-error PKG_CONFIG_PATH=/home/$(whoami)/lros-expe/vaccel/build/out/usr/local/lib/aarch64-linux-gnu/pkgconfig ../configure --target-list=aarch64-softmmu --enable-virtfs
make -j

llama.cpp

cmake -B build
cmake --build build --config Release -j

Using LoRA adapters with the models

Find some suitable adapters for the model you want to infer: For example for the Llama-3.1-1b-Instruct model you can use Llama-TOS and MentalChat-16K.
Clone the repo containg the adapter_config.json and adapter_model.safetensors files.
Convert the LoRA into GGUF format using the convert_lora_to_gguf.py script from the llama.cpp repo
1. Install the requirements using pip install -r requirements/requirements-convert_lora_to_gguf.txt
2. ./convert_lora_to_gguf.py --outfile <lora-name>.gguf --outtype f16 <cloned lora repo>
Start llama-server with the LoRAs: Add --lora-scaled path/to/lora.gguf 0 for every LoRA you want to supply.
1. Note: It should be possible to just do --lora path/to/lora.gguf and additionally add --lora-init-without-apply but that did not work in my tests
Modify the applied LoRA(s) using:
1. A POST request to /lora-adapters supplying [{"id": 0, "scale": 0.2},{"id": 1, "scale": 0.8}] as the request body (not included LoRAs are automatically scaled to 0).
2. Per request by adding a lora parameter to the json request body, that contains an array like above

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
bench/out		bench/out
benchmarks		benchmarks
llama.cpp @ d79ab18		llama.cpp @ d79ab18
lros @ e70619a		lros @ e70619a
lros-qemu @ 4262002		lros-qemu @ 4262002
models		models
nix		nix
scripts		scripts
vaccel @ cc9942e		vaccel @ cc9942e
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

lros-expe

Hardware details

Setup

Build

Using LoRA adapters with the models

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

TUM-DSE/lros-expe

Folders and files

Latest commit

History

Repository files navigation

lros-expe

Hardware details

Setup

Build

Using LoRA adapters with the models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages