Skip to content

aashishbishow/midi-ish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cinematic MIDI Generator (PyTorch)

midi-ish a tiny neural model on a folder of MIDI files, then generate a full piano “cinematic” cue (melody + chord progression + bass + arpeggios) that builds, resolves, and ends naturally.

  • Audio preview: generated_song.mp3
  • MIDI output: generated_song.mid

This repo intentionally keeps everything minimal: the entire pipeline lives in main.py.


The biggest flex

This project is deliberately anti-bloat:

  • No Transformers. No CNNs. No RNNs. No LSTMs.
    • The model is a small MLP with a wave nonlinearity (sin(linear(x))) and a compact latent-space block (pinn) that transforms note features.
  • No huge dataset required (it scales down).
    • It trains on whatever you put in midi_dataset/ — from a handful of MIDI files to a bigger folder. The architecture is intentionally lightweight.
  • No heavy computation.
    • Small network, mini-batch training, and fast generation. This is meant to run locally without a monster GPU budget.
  • Music-aware by construction.
    • The generator isn’t just “sample tokens and pray.” It explicitly builds harmony, bass, arpeggios, melody, dynamics, and an outro cadence so the piece lands like a cue.
  • Not a giant probabilistic pattern-matcher.
    • There’s randomness for variation, but the sound comes from structured musical rules + a compact learned representation, not heavyweight probabilistic sequence modeling.

If you want “big model = big result,” that’s a different project. This one is about doing a lot with a little.


What’s in this repo ?

  • main.py
    • Loads the dataset, trains the model, and generates a new piece.
  • midi_dataset/
    • A folder of .mid files used as training data.
  • generated_song.mid
    • A sample generated MIDI from the current code.
  • generated_song.mp3
    • A rendered audio preview of the generated MIDI.

Quickstart

1) Install dependencies

You need Python + pip, then:

pip install torch pretty_midi

Notes:

  • If torch installation fails, install it from the official PyTorch selector for your platform.
  • pretty_midi pulls in MIDI parsing/writing dependencies automatically.

2) Run training + generation

From the repo root:

python main.py

You’ll see:

  • the training tensor shape
  • periodic loss prints
  • the random seed index chosen
  • the inferred key/mode/BPM
  • MIDI generated: generated_song.mid

Listening

Option A: Listen to the audio preview

Open generated_song.mp3 in any media player.

Option B: Open the MIDI

Open generated_song.mid in:

  • a DAW (Ableton, FL Studio, Logic, Reaper, etc.)
  • notation software (MuseScore)
  • a MIDI player that supports a piano SoundFont

If an online MIDI player is silent, it often means it’s not actually rendering audio (or it needs a SoundFont). Try a different player or load it into a DAW.


How it works (deep dive)

main.py does three big things:

  1. MIDI → training tensors
  2. Train a next-step predictor
  3. Generate a song-like arrangement (cinematic style)

1) MIDI → tensors

For each MIDI file in midi_dataset/, the script extracts note events from one instrument program (default: instrument=0, typically piano).

Each note becomes an 8D feature vector:

[ frequency_hz,
  amplitude,
  duration_seconds,
  happy,
  sad,
  calm,
  tense,
  instrument_flag ]

Key details:

  • frequency_hz comes from MIDI pitch via pretty_midi.note_number_to_hz().
  • amplitude is velocity normalized to 0..1.
  • duration_seconds is note.end - note.start.
  • The emotion values are currently fixed defaults (they’re conditioning inputs, not labels).
  • Everything is forced to float32 so the tensors match PyTorch layer dtypes.

2) Train a next-step predictor

The dataset is converted into pairs:

  • X_train = note at time t
  • Y_train = note at time t+1

The training target is essentially: “given the current note + conditioning, predict the next note.”

Mini-batching is used (via DataLoader) so the model trains over the full dataset without trying to process everything in one giant forward pass.

3) Generation: cinematic arrangement, not just random notes

This is the part that makes the output feel like a cue instead of a beep test.

The generator creates multiple musical layers:

Harmony (chords)

  • Chooses a key from the seed note
  • Uses a minor mode by default for cinematic color
  • Uses a song form (intro/verse/chorus/bridge/chorus/outro)
  • Rotates chord progressions per section
  • Adds tasteful extensions (e.g., add9/sus colors) more often in big sections
  • Holds a final tonic chord through the ending

Accompaniment

  • Pad chord layer (sustained harmony)
  • Bass on strong beats (with chorus thickening like fifth/octave support)
  • Arpeggios
    • lighter (8ths) in intro/verse
    • denser (16ths) in chorus/bridge
    • fades out at the very end so the cadence lands clearly

Melody

  • Stays in a comfortable piano register
  • Quantized to the chosen scale
  • Strong beats bias toward chord tones
  • Weak beats allow stepwise passing motion
  • Adds rare chromatic approach tones for tension
  • In the outro: pushes melodic motion downward and forces a clear tonic resolution

Dynamics / “film arc”

  • Applies a global swell (crescendo) over the piece
  • Then fades in the outro, while thinning texture
  • Sometimes octave-doubles melody in the late “climax” section

Where the neural net fits in

The neural model influences:

  • melodic pitch target tendencies
  • velocity shaping
  • note lengths

…but the generator bounds and musicalizes the raw output so it stays playable, audible, and song-like.


Controls you can tweak (in main.py)

Training

  • epochs (default: 50)
  • batch_size (default: 4096)
  • LR (learning rate inside the optimizer)

Dataset parsing

  • midi_to_tensor(... instrument=0, emotion=[...])
    • change instrument if your dataset uses a different program number
    • change emotion vector if you want different conditioning

Generation

The call at the bottom of the file is the main switchboard:

  • seconds=None
    • derive duration from the song form + a final hold
    • set a number (e.g., seconds=180) if you want a hard duration
  • bpm=84
    • slower = bigger cinematic feel
  • step_seconds=0.5
    • larger = more spacious melody rhythm
  • polyphony=3
    • affects chord thickness
  • style="cinematic"
    • currently the main style preset

Pro tip: run multiple times. The script uses randomness (seed selection + musical choices), so you get a new cue each run.


Troubleshooting

“I hear nothing” when playing the MIDI

  • Some online players don’t render audio (or require a SoundFont).
  • Try a DAW, MuseScore, or a player that supports SoundFonts.
  • Check that the notes have non-zero velocity (this project clamps velocities to avoid silent output).

ModuleNotFoundError: pretty_midi

pip install pretty_midi

RuntimeError: mat1 and mat2 must have the same dtype

  • This happens when inputs are float64 but model weights are float32.
  • The project forces tensors to float32 during dataset creation.

It trains slowly

  • Reduce epochs.
  • If you have a GPU and CUDA-enabled PyTorch, it will use it automatically.

Limitations (by design)

  • The “emotion” conditioning is currently constant (not derived from labels).
  • The model is a simple next-step predictor; the cinematic structure mostly comes from the generator logic.
  • This is a creative coding experiment, not a polished music model.

Repro tips

If you want the same output twice, add these near the top of main.py:

random.seed(0)
torch.manual_seed(0)

(You’ll also want to fix the seed_idx selection.)


License

You can do whatever the hell you wanna do.

About

A lightweight Python project that trains a small PyTorch “WavePINN” model on a MIDI dataset and generates a cinematic-style piano cue

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages