Skip to content

Conversation

christian-byrne
Copy link
Collaborator

This changes LoadAudio to allow you to use all audio files supported by torchaudio.

This is the function torchaudio (2.3.1) is using to load audio files:

    def load(
        uri: Union[BinaryIO, str, os.PathLike],
        frame_offset: int = 0,
        num_frames: int = -1,
        normalize: bool = True,
        channels_first: bool = True,
        format: Optional[str] = None,
        buffer_size: int = 4096,
        backend: Optional[str] = None,
    ) -> Tuple[torch.Tensor, int]:
        """Load audio data from source.

        By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
        ``float32`` dtype, and the shape of `[channel, time]`.

        Note:
            The formats this function can handle depend on the availability of backends.
            Please use the following functions to fetch the supported formats.

            - FFmpeg: :py:func:`torchaudio.utils.ffmpeg_utils.get_audio_decoders`
            - Sox: :py:func:`torchaudio.utils.sox_utils.list_read_formats`
            - SoundFile: Refer to `the official document <https://pysoundfile.readthedocs.io/>`__.
        rest of docstring...
        """

Here is an alternative to hardocoding the supported formats:

  • soundfile: soundfile.available_formats()
  • sox: torchaudio.utils.sox_utils.list_read_formats()
  • ffmpeg: torchaudio.utils.ffmpeg_utils.get_audio_decoders, but this returns a list of the codecs not the actual file extensions. To get the extensions you can use a subprocess to ffmpeg -formats and then parse the output.

Those are the techniques I used to generate the hardcoded lists. Sometimes the audio player widget doesnt support the format but everything else will still work (e.g., aiff).

sox_formats = ['.8svx', '.aif', '.aifc', '.aiff', '.aiffc', '.al', '.amb', '.amr-nb', '.amr-wb', '.anb', '.au', '.avr', '.awb', '.caf', '.cdda', '.cdr', '.cvs', '.cvsd', '.cvu', '.dat', '.dvms', '.f32', '.f4', '.f64', '.f8', '.fap', '.flac', '.fssd', '.gsm', '.gsrt', '.hcom', '.htk', '.ima', '.ircam', '.la', '.lpc', '.lpc10', '.lu', '.mat', '.mat4', '.mat5', '.maud', '.nist', '.ogg', '.paf', '.prc', '.pvf', '.raw', '.s1', '.s16', '.s2', '.s24', '.s3', '.s32', '.s4', '.s8', '.sb', '.sd2', '.sds', '.sf', '.sl', '.sln', '.smp', '.snd', '.sndfile', '.sndr', '.sndt', '.sou', '.sox', '.sph', '.sw', '.txw', '.u1', '.u16', '.u2', '.u24', '.u3', '.u32', '.u4', '.u8', '.ub', '.ul', '.uw', '.vms', '.voc', '.vorbis', '.vox', '.w64', '.wav', '.wavpcm', '.wv', '.wve', '.xa', '.xi']
supported.update(sox_formats)
if "ffmpeg" in available_backends:
ffmpeg_formats = ['.3dostr', '.4xm', '.aa', '.aac', '.aax', '.ace', '.acm', '.act', '.adf', '.adp', '.ads', '.aea', '.afc', '.aix', '.alias_pix', '.amrnb', '.amrwb', '.anm', '.apac', '.apc', '.ape', '.aqtitle', '.argo_brp', '.asf_o', '.av1', '.avr', '.avs', '.bethsoftvid', '.bfi', '.bfstm', '.bin', '.bink', '.binka', '.bitpacked', '.bmp_pipe', '.bmv', '.boa', '.bonk', '.brender_pix', '.brstm', '.c93', '.cdg', '.cdxl', '.cine', '.concat', '.cri_pipe', '.dcstr', '.dds_pipe', '.derf', '.dfa', '.dhav', '.dpx_pipe', '.dsf', '.dsicin', '.dss', '.dtshd', '.dvbsub', '.dvbtxt', '.dxa', '.ea', '.ea_cdata', '.epaf', '.exr_pipe', '.flic', '.frm', '.fsb', '.fwse', '.g729', '.gdv', '.gem_pipe', '.genh', '.gif_pipe', '.hca', '.hcom', '.hdr_pipe', '.hnm', '.idcin', '.idf', '.iec61883', '.iff', '.ifv', '.imf', '.ingenient', '.ipmovie', '.ipu', '.iss', '.iv8', '.ivr', '.j2k_pipe', '.jack', '.jpeg_pipe', '.jpegls_pipe', '.jpegxl_pipe', '.jv', '.kmsgrab', '.kux', '.laf', '.lavfi', '.libcdio', '.libdc1394', '.libgme', '.libopenmpt', '.live_flv', '.lmlm4', '.loas', '.luodat', '.lvf', '.lxf', '.matroska', '.webm', '.mca', '.mcc', '.mgsts', '.mjpeg_2000', '.mlv', '.mm', '.mods', '.moflex', '.mov', '.mp4', '.m4a', '.3gp', '.3g2', '.mj2', '.mpc', '.mpc8', '.mpegtsraw', '.mpegvideo', '.mpl2', '.mpsub', '.msf', '.msnwctcp', '.msp', '.mtaf', '.mtv', '.musx', '.mv', '.mvi', '.mxg', '.nc', '.nistsphere', '.nsp', '.nsv', '.nuv', '.openal', '.paf', '.pam_pipe', '.pbm_pipe', '.pcx_pipe', '.pfm_pipe', '.pgm_pipe', '.pgmyuv_pipe', '.pgx_pipe', '.phm_pipe', '.photocd_pipe', '.pictor_pipe', '.pjs', '.pmp', '.png_pipe', '.pp_bnk', '.ppm_pipe', '.psd_pipe', '.psxstr', '.pva', '.pvf', '.qcp', '.qdraw_pipe', '.qoi_pipe', '.r3d', '.realtext', '.redspark', '.rka', '.rl2', '.rpl', '.rsd', '.s337m', '.sami', '.sbg', '.scd', '.sdns', '.sdp', '.sdr2', '.sds', '.sdx', '.ser', '.sga', '.sgi_pipe', '.shn', '.siff', '.simbiosis_imx', '.sln', '.smk', '.smush', '.sol', '.stl', '.subviewer', '.subviewer1', '.sunrast_pipe', '.svag', '.svg_pipe', '.svs', '.tak', '.tedcaptions', '.thp', '.tiertexseq', '.tiff_pipe', '.tmv', '.tty', '.txd', '.ty', '.v210', '.v210x', '.vag', '.vbn_pipe', '.vividas', '.vivo', '.vmd', '.vobsub', '.vpk', '.vplayer', '.vqf', '.wady', '.wavarc', '.wc3movie', '.webp_pipe', '.wsd', '.wsvqa', '.wve', '.x11grab', '.xa', '.xbin', '.xbm_pipe', '.xmd', '.xmv', '.xpm_pipe', '.xvag', '.xwd_pipe', '.xwma', '.yop']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the best way to do this? Is there no variable to read or something to get this list dynamically

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods I mentioned in the PR description can generate the lists dynamically. Here are the trade-offs as far as I can tell:

  • For soundfile, you have to import soundfile, which may or may not match the version used by torchaudio.
  • For ffmpeg, you need a way to map codecs to file extensions if using torchaudio.utils.ffmpeg_utils.get_audio_decoders(), or use subprocess.

The reason I hard-coded the lists was becuase I didn't think the supported formats in each library were volatile enough to warrant that overhead. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it'd be better to just not list at all, and instead just try to load blindly regardless of file ext, and just error if it errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think blindly load regardless of ext is good idea, as video/image file will also be recognized and blended into the selection choices.

@christian-byrne
Copy link
Collaborator Author

torchaudio can also load and extract audio from video files if ffmpeg is available, which is why video formats are included in the ffmpeg list.

I have been using the LoadAudio node with videos and can confirm it works.

@mcmonkey4eva mcmonkey4eva added User Support A user needs help with something, probably not a bug. and removed User Support A user needs help with something, probably not a bug. labels Sep 12, 2024
@christian-byrne
Copy link
Collaborator Author

Implemented by #4054.

@christian-byrne christian-byrne deleted the audio-filetypes branch September 16, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants