It seems that the underlying library produces good stereo output after music/vocal separation but after that these results are merged into mono with ffmpeg in your code, there's hardcoded number of channels as ac=1 in lib/audio.py => load_audio. Is it possible to return the audio as is?