midi-ish a tiny neural model on a folder of MIDI files, then generate a full piano “cinematic” cue (melody + chord progression + bass + arpeggios) that builds, resolves, and ends naturally.
- Audio preview:
generated_song.mp3 - MIDI output:
generated_song.mid
This repo intentionally keeps everything minimal: the entire pipeline lives in
main.py.
This project is deliberately anti-bloat:
- No Transformers. No CNNs. No RNNs. No LSTMs.
- The model is a small MLP with a wave nonlinearity (
sin(linear(x))) and a compact latent-space block (pinn) that transforms note features.
- The model is a small MLP with a wave nonlinearity (
- No huge dataset required (it scales down).
- It trains on whatever you put in
midi_dataset/— from a handful of MIDI files to a bigger folder. The architecture is intentionally lightweight.
- It trains on whatever you put in
- No heavy computation.
- Small network, mini-batch training, and fast generation. This is meant to run locally without a monster GPU budget.
- Music-aware by construction.
- The generator isn’t just “sample tokens and pray.” It explicitly builds harmony, bass, arpeggios, melody, dynamics, and an outro cadence so the piece lands like a cue.
- Not a giant probabilistic pattern-matcher.
- There’s randomness for variation, but the sound comes from structured musical rules + a compact learned representation, not heavyweight probabilistic sequence modeling.
If you want “big model = big result,” that’s a different project. This one is about doing a lot with a little.
main.py- Loads the dataset, trains the model, and generates a new piece.
midi_dataset/- A folder of
.midfiles used as training data.
- A folder of
generated_song.mid- A sample generated MIDI from the current code.
generated_song.mp3- A rendered audio preview of the generated MIDI.
You need Python + pip, then:
pip install torch pretty_midiNotes:
- If
torchinstallation fails, install it from the official PyTorch selector for your platform. pretty_midipulls in MIDI parsing/writing dependencies automatically.
From the repo root:
python main.pyYou’ll see:
- the training tensor shape
- periodic loss prints
- the random seed index chosen
- the inferred key/mode/BPM
MIDI generated: generated_song.mid
Open generated_song.mp3 in any media player.
Open generated_song.mid in:
- a DAW (Ableton, FL Studio, Logic, Reaper, etc.)
- notation software (MuseScore)
- a MIDI player that supports a piano SoundFont
If an online MIDI player is silent, it often means it’s not actually rendering audio (or it needs a SoundFont). Try a different player or load it into a DAW.
main.py does three big things:
- MIDI → training tensors
- Train a next-step predictor
- Generate a song-like arrangement (cinematic style)
For each MIDI file in midi_dataset/, the script extracts note events from one instrument program (default: instrument=0, typically piano).
Each note becomes an 8D feature vector:
[ frequency_hz,
amplitude,
duration_seconds,
happy,
sad,
calm,
tense,
instrument_flag ]
Key details:
frequency_hzcomes from MIDI pitch viapretty_midi.note_number_to_hz().amplitudeis velocity normalized to0..1.duration_secondsisnote.end - note.start.- The emotion values are currently fixed defaults (they’re conditioning inputs, not labels).
- Everything is forced to
float32so the tensors match PyTorch layer dtypes.
The dataset is converted into pairs:
X_train= note at time tY_train= note at time t+1
The training target is essentially: “given the current note + conditioning, predict the next note.”
Mini-batching is used (via DataLoader) so the model trains over the full dataset without trying to process everything in one giant forward pass.
This is the part that makes the output feel like a cue instead of a beep test.
The generator creates multiple musical layers:
- Chooses a key from the seed note
- Uses a minor mode by default for cinematic color
- Uses a song form (intro/verse/chorus/bridge/chorus/outro)
- Rotates chord progressions per section
- Adds tasteful extensions (e.g., add9/sus colors) more often in big sections
- Holds a final tonic chord through the ending
- Pad chord layer (sustained harmony)
- Bass on strong beats (with chorus thickening like fifth/octave support)
- Arpeggios
- lighter (8ths) in intro/verse
- denser (16ths) in chorus/bridge
- fades out at the very end so the cadence lands clearly
- Stays in a comfortable piano register
- Quantized to the chosen scale
- Strong beats bias toward chord tones
- Weak beats allow stepwise passing motion
- Adds rare chromatic approach tones for tension
- In the outro: pushes melodic motion downward and forces a clear tonic resolution
- Applies a global swell (crescendo) over the piece
- Then fades in the outro, while thinning texture
- Sometimes octave-doubles melody in the late “climax” section
The neural model influences:
- melodic pitch target tendencies
- velocity shaping
- note lengths
…but the generator bounds and musicalizes the raw output so it stays playable, audible, and song-like.
epochs(default:50)batch_size(default:4096)LR(learning rate inside the optimizer)
midi_to_tensor(... instrument=0, emotion=[...])- change
instrumentif your dataset uses a different program number - change
emotionvector if you want different conditioning
- change
The call at the bottom of the file is the main switchboard:
seconds=None- derive duration from the song form + a final hold
- set a number (e.g.,
seconds=180) if you want a hard duration
bpm=84- slower = bigger cinematic feel
step_seconds=0.5- larger = more spacious melody rhythm
polyphony=3- affects chord thickness
style="cinematic"- currently the main style preset
Pro tip: run multiple times. The script uses randomness (seed selection + musical choices), so you get a new cue each run.
- Some online players don’t render audio (or require a SoundFont).
- Try a DAW, MuseScore, or a player that supports SoundFonts.
- Check that the notes have non-zero velocity (this project clamps velocities to avoid silent output).
pip install pretty_midi- This happens when inputs are
float64but model weights arefloat32. - The project forces tensors to
float32during dataset creation.
- Reduce
epochs. - If you have a GPU and CUDA-enabled PyTorch, it will use it automatically.
- The “emotion” conditioning is currently constant (not derived from labels).
- The model is a simple next-step predictor; the cinematic structure mostly comes from the generator logic.
- This is a creative coding experiment, not a polished music model.
If you want the same output twice, add these near the top of main.py:
random.seed(0)
torch.manual_seed(0)(You’ll also want to fix the seed_idx selection.)
You can do whatever the hell you wanna do.