Skip to content

Audio Feature Extraction about Audio-to-Video Generation #34

@suimuc

Description

@suimuc

Hello, I’m currently exploring the functionality of the audio-to-video script in this repository and would like to understand how the audio features are extracted as part of the process, specifically regarding the STFT features in the stft_pickle data which has a shape of (90, 45, 17) while the corresponding video has 90 frames; could you explain how the STFT (Short-Time Fourier Transform) features are computed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions