Cinematic MIDI Generator (PyTorch)

midi-ish a tiny neural model on a folder of MIDI files, then generate a full piano “cinematic” cue (melody + chord progression + bass + arpeggios) that builds, resolves, and ends naturally.

Audio preview: generated_song.mp3
MIDI output: generated_song.mid

This repo intentionally keeps everything minimal: the entire pipeline lives in main.py.

The biggest flex

This project is deliberately anti-bloat:

No Transformers. No CNNs. No RNNs. No LSTMs.
- The model is a small MLP with a wave nonlinearity (sin(linear(x))) and a compact latent-space block (pinn) that transforms note features.
No huge dataset required (it scales down).
- It trains on whatever you put in midi_dataset/ — from a handful of MIDI files to a bigger folder. The architecture is intentionally lightweight.
No heavy computation.
- Small network, mini-batch training, and fast generation. This is meant to run locally without a monster GPU budget.
Music-aware by construction.
- The generator isn’t just “sample tokens and pray.” It explicitly builds harmony, bass, arpeggios, melody, dynamics, and an outro cadence so the piece lands like a cue.
Not a giant probabilistic pattern-matcher.
- There’s randomness for variation, but the sound comes from structured musical rules + a compact learned representation, not heavyweight probabilistic sequence modeling.

If you want “big model = big result,” that’s a different project. This one is about doing a lot with a little.

What’s in this repo ?

main.py
- Loads the dataset, trains the model, and generates a new piece.
midi_dataset/
- A folder of .mid files used as training data.
generated_song.mid
- A sample generated MIDI from the current code.
generated_song.mp3
- A rendered audio preview of the generated MIDI.

Quickstart

1) Install dependencies

You need Python + pip, then:

pip install torch pretty_midi

Notes:

If torch installation fails, install it from the official PyTorch selector for your platform.
pretty_midi pulls in MIDI parsing/writing dependencies automatically.

2) Run training + generation

From the repo root:

python main.py

You’ll see:

the training tensor shape
periodic loss prints
the random seed index chosen
the inferred key/mode/BPM
MIDI generated: generated_song.mid

Listening

Option A: Listen to the audio preview

Open generated_song.mp3 in any media player.

Option B: Open the MIDI

Open generated_song.mid in:

a DAW (Ableton, FL Studio, Logic, Reaper, etc.)
notation software (MuseScore)
a MIDI player that supports a piano SoundFont

If an online MIDI player is silent, it often means it’s not actually rendering audio (or it needs a SoundFont). Try a different player or load it into a DAW.

How it works (deep dive)

main.py does three big things:

MIDI → training tensors
Train a next-step predictor
Generate a song-like arrangement (cinematic style)

1) MIDI → tensors

For each MIDI file in midi_dataset/, the script extracts note events from one instrument program (default: instrument=0, typically piano).

Each note becomes an 8D feature vector:

[ frequency_hz,
  amplitude,
  duration_seconds,
  happy,
  sad,
  calm,
  tense,
  instrument_flag ]

Key details:

frequency_hz comes from MIDI pitch via pretty_midi.note_number_to_hz().
amplitude is velocity normalized to 0..1.
duration_seconds is note.end - note.start.
The emotion values are currently fixed defaults (they’re conditioning inputs, not labels).
Everything is forced to float32 so the tensors match PyTorch layer dtypes.

2) Train a next-step predictor

The dataset is converted into pairs:

X_train = note at time t
Y_train = note at time t+1

The training target is essentially: “given the current note + conditioning, predict the next note.”

Mini-batching is used (via DataLoader) so the model trains over the full dataset without trying to process everything in one giant forward pass.

3) Generation: cinematic arrangement, not just random notes

This is the part that makes the output feel like a cue instead of a beep test.

The generator creates multiple musical layers:

Harmony (chords)

Chooses a key from the seed note
Uses a minor mode by default for cinematic color
Uses a song form (intro/verse/chorus/bridge/chorus/outro)
Rotates chord progressions per section
Adds tasteful extensions (e.g., add9/sus colors) more often in big sections
Holds a final tonic chord through the ending

Accompaniment

Pad chord layer (sustained harmony)
Bass on strong beats (with chorus thickening like fifth/octave support)
Arpeggios
- lighter (8ths) in intro/verse
- denser (16ths) in chorus/bridge
- fades out at the very end so the cadence lands clearly

Melody

Stays in a comfortable piano register
Quantized to the chosen scale
Strong beats bias toward chord tones
Weak beats allow stepwise passing motion
Adds rare chromatic approach tones for tension
In the outro: pushes melodic motion downward and forces a clear tonic resolution

Dynamics / “film arc”

Applies a global swell (crescendo) over the piece
Then fades in the outro, while thinning texture
Sometimes octave-doubles melody in the late “climax” section

Where the neural net fits in

The neural model influences:

melodic pitch target tendencies
velocity shaping
note lengths

…but the generator bounds and musicalizes the raw output so it stays playable, audible, and song-like.

Controls you can tweak (in `main.py`)

Training

epochs (default: 50)
batch_size (default: 4096)
LR (learning rate inside the optimizer)

Dataset parsing

midi_to_tensor(... instrument=0, emotion=[...])
- change instrument if your dataset uses a different program number
- change emotion vector if you want different conditioning

Generation

The call at the bottom of the file is the main switchboard:

seconds=None
- derive duration from the song form + a final hold
- set a number (e.g., seconds=180) if you want a hard duration
bpm=84
- slower = bigger cinematic feel
step_seconds=0.5
- larger = more spacious melody rhythm
polyphony=3
- affects chord thickness
style="cinematic"
- currently the main style preset

Pro tip: run multiple times. The script uses randomness (seed selection + musical choices), so you get a new cue each run.

Troubleshooting

“I hear nothing” when playing the MIDI

Some online players don’t render audio (or require a SoundFont).
Try a DAW, MuseScore, or a player that supports SoundFonts.
Check that the notes have non-zero velocity (this project clamps velocities to avoid silent output).

`ModuleNotFoundError: pretty_midi`

pip install pretty_midi

`RuntimeError: mat1 and mat2 must have the same dtype`

This happens when inputs are float64 but model weights are float32.
The project forces tensors to float32 during dataset creation.

It trains slowly

Reduce epochs.
If you have a GPU and CUDA-enabled PyTorch, it will use it automatically.

Limitations (by design)

The “emotion” conditioning is currently constant (not derived from labels).
The model is a simple next-step predictor; the cinematic structure mostly comes from the generator logic.
This is a creative coding experiment, not a polished music model.

Repro tips

If you want the same output twice, add these near the top of main.py:

random.seed(0)
torch.manual_seed(0)

(You’ll also want to fix the seed_idx selection.)

License

You can do whatever the hell you wanna do.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
midi_dataset		midi_dataset
README.md		README.md
generated_song.mid		generated_song.mid
generated_song.mp3		generated_song.mp3
main.py		main.py

Folders and files

Latest commit

History

Repository files navigation

Cinematic MIDI Generator (PyTorch)

The biggest flex

What’s in this repo ?

Quickstart

1) Install dependencies

2) Run training + generation

Listening

Option A: Listen to the audio preview

Option B: Open the MIDI

How it works (deep dive)

1) MIDI → tensors

2) Train a next-step predictor

3) Generation: cinematic arrangement, not just random notes

Harmony (chords)

Accompaniment

Melody

Dynamics / “film arc”

Where the neural net fits in

Controls you can tweak (in main.py)

Training

Dataset parsing

Generation

Troubleshooting

“I hear nothing” when playing the MIDI

ModuleNotFoundError: pretty_midi

RuntimeError: mat1 and mat2 must have the same dtype

It trains slowly

Limitations (by design)

Repro tips

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Controls you can tweak (in `main.py`)

`ModuleNotFoundError: pretty_midi`

`RuntimeError: mat1 and mat2 must have the same dtype`

Packages