You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Full disclosure: I have no qualifications in the field of AI, I'm just following the evolution of it, out of interest, as a programmer and former musician.
Just wanted to share something I've been noticing about the generated samples. Might be useful, might not. 🙂
If you listen to generated samples of Aretha Franklin and Frank Sinatra in particular, it's especially obvious: the model seems to have learned "recording quality", of vocals in particular, and it applies this sporadically.
Within moments of the same sample, you can hear the recording quality of the vocals changing drastically - likely reflecting the fact that Aretha and Frank both recorded over long periods of time, so the quality of the media, microphone, mastering etc. changed and produced quite radically different recordings, which the model seems to switch between at random, every word or so.
This is much less evident in, for example, some of the modern pop music recordings, as these recordings are generally much more streamlined according to current trends. (To be poignant, a lot of pop music sounds "the same".)
I'm wondering if you could add this to the model? At it's simplest, maybe start by incorporating the recording year, data that should be easy to obtain - whereas something like the original recording medium or microphone type is probably almost impossible to obtain, but equipment trends roughly follow the years, so the year alone might be enough to make a difference.
Imagine being able to ask for a recording of modern rap or pop artists in 40s quality, or Aretha Franklin in 2020. 🙂
Anyhow, interesting project! Cheers.
The text was updated successfully, but these errors were encountered:
Full disclosure: I have no qualifications in the field of AI, I'm just following the evolution of it, out of interest, as a programmer and former musician.
Just wanted to share something I've been noticing about the generated samples. Might be useful, might not. 🙂
If you listen to generated samples of Aretha Franklin and Frank Sinatra in particular, it's especially obvious: the model seems to have learned "recording quality", of vocals in particular, and it applies this sporadically.
Within moments of the same sample, you can hear the recording quality of the vocals changing drastically - likely reflecting the fact that Aretha and Frank both recorded over long periods of time, so the quality of the media, microphone, mastering etc. changed and produced quite radically different recordings, which the model seems to switch between at random, every word or so.
This is much less evident in, for example, some of the modern pop music recordings, as these recordings are generally much more streamlined according to current trends. (To be poignant, a lot of pop music sounds "the same".)
I'm wondering if you could add this to the model? At it's simplest, maybe start by incorporating the recording year, data that should be easy to obtain - whereas something like the original recording medium or microphone type is probably almost impossible to obtain, but equipment trends roughly follow the years, so the year alone might be enough to make a difference.
Imagine being able to ask for a recording of modern rap or pop artists in 40s quality, or Aretha Franklin in 2020. 🙂
Anyhow, interesting project! Cheers.
The text was updated successfully, but these errors were encountered: