Try the model here. Check out a video showcase here.
Mapperatorinator is multi-model framework that uses spectrogram inputs to generate fully featured osu! beatmaps for all gamemodes. The goal of this project is to automatically generate rankable quality osu! beatmaps from any song with a high degree of customizability.
This project is built upon osuT5 and osu-diffusion. In developing this, I spent about 2500 hours of GPU compute across 142 runs on my 4060 Ti and rented 4090 instances on vast.ai.
Use this tool responsibly. Always disclose the use of AI in your beatmaps. Do not upload the generated beatmaps.
The instruction below allows you to generate beatmaps on your local machine, or you can run it in the cloud with the colab notebook.
git clone https://github.com/OliBomby/Mapperatorinator.git
cd Mapperatorinator
python -m venv .venv
# In cmd.exe
.venv\Scripts\activate.bat
# In PowerShell
.venv\Scripts\Activate.ps1
# In Linux or MacOS
source .venv/bin/activate
Install Python 3.10, Git, ffmpeg, PyTorch, and the remaining Python dependencies.
pip install -r requirements.txt
Run inference.py
and pass in some arguments to generate beatmaps. For this use Hydra override syntax. See inference.yaml
for all available parameters.
python inference.py \
audio_path [Path to input audio] \
output_path [Path to output directory] \
beatmap_path [Path to .osu file to autofill metadata, audio_path, and output_path, or use as reference] \
gamemode [Game mode to generate 0=std, 1=taiko, 2=ctb, 3=mania] \
difficulty [Difficulty star rating to generate] \
mapper_id [Mapper user ID for style] \
year [Upload year to simulate] \
hitsounded [Whether to add hitsounds] \
slider_multiplier [Slider velocity multiplier] \
circle_size [Circle size] \
keycount [Key count for mania] \
hold_note_ratio [Hold note ratio for mania 0-1] \
scroll_speed_ratio [Scroll speed ratio for mania and ctb 0-1] \
descriptors [List of OMDB descriptors for style] \
negative_descriptors [List of OMDB negative descriptors for classifier-free guidance] \
add_to_beatmap [Whether to add generated content to the reference beatmap instead of making a new beatmap] \
start_time [Generation start time in milliseconds] \
end_time [Generation end time in milliseconds] \
in_context [List of additional context to provide to the model [NONE,TIMING,KIAI,MAP,GD,NO_HS]] \
output_type [List of content types to generate] \
cfg_scale [Scale of the classifier-free guidance] \
super_timing [Whether to use slow accurate variable BPM timing generator] \
seed [Random seed for generation] \
Example:
python inference.py beatmap_path="'C:\Users\USER\AppData\Local\osu!\Songs\1 Kenji Ninuma - DISCO PRINCE\Kenji Ninuma - DISCOPRINCE (peppy) [Normal].osu'" gamemode=0 difficulty=5.5 year=2023 descriptors="['jump aim','clean']" in_context=[TIMING,KIAI]
- You can edit
configs/inference_v29.yaml
and add your arguments there instead of typing them in the terminal every time. - All available descriptors can be found here.
- Always provide a year argument between 2007 and 2023. If you leave it unknown, the model might generate with an inconsistent style.
- Always provide a difficulty argument. If you leave it unknown, the model might generate with an inconsistent difficulty.
- Increase the
cfg_scale
parameter to increase the effectiveness of themapper_id
anddescriptors
arguments. - You can use the
negative_descriptors
argument to guide the model away from certain styles. This only works whencfg_scale > 1
. - If your song style and desired beatmap style don't match well, the model might not follow your directions. For example, its hard to generate a high SR, high SV beatmap for a calm song.
- If you already have timing and kiai times done for a song, then you can give this to the model to greatly increase inference speed and accuracy: Use the
beatmap_path
andin_context=[TIMING,KIAI]
arguments. - To remap just a part of your beatmap, use the
beatmap_path
,start_time
,end_time
, andadd_to_beatmap=true
arguments. - To generate a guest difficulty for a beatmap, use the
beatmap_path
andin_context=[GD,TIMING,KIAI]
arguments. - To generate hitsounds for a beatmap, use the
beatmap_path
andin_context=[NO_HS,TIMING,KIAI]
arguments. - To generate only timing for a song, use the
super_timing=true
andoutput_type=[TIMING]
arguments.
Mapperatorinator converts osu! beatmaps into an intermediate event representation that can be directly converted to and from tokens. It includes hit objects, hitsounds, slider velocities, new combos, timing points, kiai times, and taiko/mania scroll speeds.
Here is a small examle of the tokenization process:
To save on vocabulary size, time events are quantized to 10ms intervals and position coordinates are quantized to 32 pixel grid points.
The model is basically a wrapper around the HF Transformers Whisper model, with custom input embeddings and loss function. Model size amounts to 219M parameters. This model was found to be faster and more accurate than T5 for this task.
The high-level overview of the model's input-output is as follows:
The model uses Mel spectrogram frames as encoder input, with one frame per input position. The model decoder output at each step is a softmax distribution over a discrete, predefined, vocabulary of events. Outputs are sparse, events are only needed when a hit-object occurs, instead of annotating every single audio frame.
Before the SOS token are additional tokens that facilitate conditional generation. These tokens include the gamemode, difficulty, mapper ID, year, and other metadata. During training, these tokens do not have accompanying labels, so they are never output by the model. Also during training there is a random chance that a metadata token gets replaced by an 'unknown' token, so during inference we can use these 'unknown' tokens to reduce the amount of metadata we have to give to the model.
The context length of the model is 8.192 seconds long. This is obviously not enough to generate a full beatmap, so we have to split the song into multiple windows and generate the beatmap in small parts. To make sure that the generated beatmap does not have noticeable seams in between windows, we use a 90% overlap and generate the windows sequentially. Each generation window except the first starts with the decoder pre-filled up to 50% of the generation window with tokens from the previous windows. We use a logit processor to make sure that the model can't generate time tokens that are in the first 50% of the generation window. Additionally, the last 40% of the generation window is reserved for the next window. Any generated time tokens in that range are treated as EOS tokens. This ensures that each generated token is conditioned on at least 4 seconds of previous tokens and 3.3 seconds of future audio to anticipate.
To prevent offset drifting during long generation, random offsets have been added to time events in the decoder during training. This forces it to correct timing errors by listening to the onsets in the audio instead, and results in a consistently accurate offset.
Position coordinates generated by the decoder are quantized to 32 pixel grid points, so afterward we use diffusion to denoise the coordinates to the final positions. For this we trained a modified version of osu-diffusion that is specialized to only the last 10% of the noise schedule, and accepts the more advanced metadata tokens that Mapperatorinator uses for conditional generation.
Since the Mapperatorinator model outputs the SV of sliders, the required length of the slider is fixed regardless of the shape of the control point path. Therefore, we try to guide the diffusion process to create coordinates that fit the required slider lengths. We do this by recalculating the slider end positions after every step of the diffusion process based on the required length and the current control point path. This means that the diffusion process does not have direct control over the slider end positions, but it can still influence them by changing the control point path.
Mapperatorinator does some extra post-processing to improve the quality of the generated beatmap:
- Refine position coordinates with diffusion.
- Resnap time events to the nearest tick using the snap divisors generated by the model.
- Snap near-perfect positional overlaps.
- Convert mania column events to X coordinates.
- Generate slider paths for taiko drumrolls.
- Fix big discrepancies in required slider length and control point path length.
Super timing generator is an algorithm that improves the precision and accuracy of generated timing by infering timing for the whole song 20 times and averaging the results. This is useful for songs with variable BPM, or songs with BPM changes. The result is almost perfect with only sometimes a section that needs manual adjustment.
The instruction below creates a training environment on your local machine.
git clone https://github.com/OliBomby/Mapperatorinator.git
cd Mapperatorinator
Create your own dataset using the Mapperator console app. It requires an osu! OAuth client token to verify beatmaps and get additional metadata. Place the dataset in the datasets
directory next to the Mapperatorinator
directory.
Mapperator.ConsoleApp.exe dataset2 -t "/Mapperatorinator/datasets/beatmap_descriptors.csv" -i "path/to/osz/files" -o "/datasets/cool_dataset"
Training in your venv is also possible, but we recommend using Docker on WSL for better performance.
docker compose up -d --force-recreate
docker attach mapperatorinator_space
All configurations are located in ./configs/osut5/train.yaml
. Begin training by calling osuT5/train.py
.
python osuT5/train.py -cn train_v29 train_dataset_path="/workspace/datasets/cool_dataset" test_dataset_path="/workspace/datasets/cool_dataset" train_dataset_end=90 test_dataset_start=90 test_dataset_end=100
Special thanks to:
- The authors of osuT5 for their training code.
- Hugging Face team for their tools.
- Jason Won and Richard Nagyfi for bouncing ideas.
- Marvin for donating training credits.
- The osu! community for the beatmaps.
- osu! Beatmap Generator by Syps (Nick Sypteras)
- osumapper by kotritrona, jyvden, Yoyolick (Ryan Zmuda)
- osu-diffusion by OliBomby (Olivier Schipper), NiceAesth (Andrei Baciu)
- osuT5 by gyataro (Xiwen Teoh)
- Beat Learning by sedthh (Richard Nagyfi)
- osu!dreamer by jaswon (Jason Won)