Audio Feature Extraction about Audio-to-Video Generation

Hello, I’m currently exploring the functionality of the audio-to-video script in this repository and would like to understand how the audio features are extracted as part of the process, specifically regarding the STFT features in the stft_pickle data which has a shape of (90, 45, 17) while the corresponding video has 90 frames; could you explain how the STFT (Short-Time Fourier Transform) features are computed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio Feature Extraction about Audio-to-Video Generation #34

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Audio Feature Extraction about Audio-to-Video Generation #34

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions