-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threaded & render-synced FFT #34
base: main
Are you sure you want to change the base?
Conversation
Looks good so far, just a few little things. The FFT calculation should be done on the audio thread to avoid race conditions when the audio card writes samples into the circular buffer while the FFT is calculated. It also helps to spread performance between CPU cores (which we have enough of, these days), because the vvvv main loop is usually single-threaded and if the FFT is on the render thread, you reduce the performance of the CPU core that renders the graphics and logic. And the audio thread knows when there is any new audio information available. But you could also think of a dedicated thread that only does FFT but then you need to add sync locks around the buffer copy between threads to avoid these race conditions mentioned above. |
Right, I see. Could you give a quick rundown on how the audio pipeline works? I'm guessing running FFT in the audio thread would block it for a while. The 8192-window took about 2ms to execute on my laptop (which was in low-power mode at the time). Mutlithreading it would definitely be useful. I'm considering how to sync that with the main thread, as to not do too much work but still always have a new frame available. I figure a one-frame delay is workable. If the main thread triggers a new processing round on the FFT thread, the result of which is then made available on the next frame, that would be acceptable. I'm not familiar with how node lifetimes work behind-the-scenes, can you just create the thread on node instantiation and dispose it when the node is removed? In regards to thread sync and in interest of not blocking the audio thread, I assume copying the buffer when starting the FFT processing step is the way to go. Would it be sensible to have a flow as such:
That was a lot of questions! Thank you for your help 😊 edit: I have not used multithreading/reactive within vvvv, so I have no idea if it could be solved in the VL rather than the C#. edit2: I realise I way complicated this... Working on an update right now. Still not sure how to correctly implement create/dispose events though, looking into it, |
There we go, I think this should be a valid implementation. The FFT node uses just a couple of microseconds even at absurd FFT sizes, and it looks like the thread is properly disposed. I haven't very thoroughly tested all the ways vvvv can exit, so please weigh in if you have any insights on the details. I also removed the ReadWindowed function and did the window processing in the FFT thread to minimise processing done while in a lock (the windowed read behaviour was only used by the FFT anyway). All access to the circular buffer is now locked. I think the behaviour should not be mirrored in the CircularPullBuffer for the time being if it's already working fine as it is now. Maybe a mark should made so any divergence between the classes in the future is avoided. The VL diff is only changing the function used to get the FFT. |
Nice, looks pretty good. But the frame latency you mentioned is definitely not wanted. It is already difficult to get audio-responsive graphics for live input as it is, because you have audio input latency + graphics output latency. Adding another frame is not desired. So the processing trigger should come from the audio thread, whenever new data is there. This is the moment when a buffer is written into the circular buffer in I also am not sure about all the locks and how they would affect existing implementations. Maybe it would be better to make a new |
maybe consider handing the data as an observable? |
CircularBuffer is not used by anything except the FFT module at the moment, in fact only the ReadDouble method is ever referenced. (The Read function could also be improved by doing two array copies rather than element-wise copy) CircularPullBuffer is used a lot more it seems. Maybe the plain CircularBuffer could just be renamed? I'm not sure how I'm supposed to implement it on the audio thread without doing excessive processing (assuming it's still run in a separate thread). Something has to trigger the whole copy, process, present thread. Ideally, if the the FFT takes n μs, you'd start the processing n μs before the next frame update so you have the absolutely freshest data ready. You could run it continuously which would almost saturate a CPU core, but for a 2ms FFT this means a latency between 0-2ms, average 1ms. You could potentially do some kind of heuristic for when to start the FFT, but if you underestimate the time you'll worst-case have 2 frames latency/frame skips if you miss it. I do agree that (@60fps) ~17ms extra latency is pretty bad - just by itself it's getting close to what humans can differentiate. Is there something obvious I'm misunderstanding in how it's supposed to be synced? I did consider if you could all of this in the VL, but I have not actually done any asynchronous programming within vvvv yet. It would allow for greater flexibility, on the other hand my non-qualified guess is that a couple of |
i think you misunderstood my proposal. let me try to explain better. in this branch i modified your code in a minimal way to return the fft data as observable. this is how that looks like implemented in vvvv: as you can see, we have even less running in the mainloop (just the Trigger), need to care less about syncing and have the ability to do additional processing in the analysis thread. i also changed the helppatch in that branch, so you could have a look. |
Hm, I see. I realise now that Reactive is a C# feature (Rx), not just a vvvv concept! I think I understand it a bit better now. How is the performance of using Reactive compared to just simple locks? I figure Reactive has more overhead since it's a more extensive/complex system, but I have no fathom of what kind of scale we are talking. It could be worth it for being able to easily run additional logic inside vvvv though. Maybe a separate FFT (Reactive) node could be added? As I understand this does not however solve the frame-delay issue, am I right? I figure it might be ok in the meantime regardless, as the current implementation has except for anything <=256 samples already a higher latency + loads of jitter depending on how the FFT/render frames line up. I'll try adding a toggle for adding a trigger that runs on-new-samples-available, which would provide the theoretically lowest latency at the cost of CPU load. |
First, I want to say thank you for improving the FFT situation! Here is an idea for how the frame delay issue could be solved: The sample counter could be at the end of the Regarding the new thread lock modifications of the CircularBuffer, I'd separate that, even though the current version is not in use anywhere else. The existing one might become handy for other audio transformations or in case we keep the current state of FFT as a legacy node. Then it needs to have the exact same code paths as it has right now. |
Actually coded up just that a today but didn't push :P The new commit makes it so the FFT node has a pin which chooses whether to run frame-synced, which is CPU-optimal but has a 1-frame delay, or to run as quickly as possible as long as new data has arrived. For the second mode, the trigger is set every time new data is written to the buffer. This means if the calculation is already running, it will just be primed to run again as soon as it's ready. Being able to toggle it with a pin is nice for debug/release-type work. There's definitely a question in there how to handle the change, as to not make it breaking. You can add new pins without breaking, but there's still the fact that the behaviour has changed. I see for most projects it would just be an improvement, but there might be a just a few that rely on the specific way the old FFT worked - is this significant enough to warrant keeping the old FFT node in, without a name change? I think moving forward is the better option - old projects probably run on older versions of vvvv regardless. As for reactive or not: I think one reason to not make the standard FFT reactive, is that for new users the FFT should "just work" without having to fiddle with additional nodes. I'm all for adding a separate reactive version of it though. |
Great. For legacy nodes, we have the Obsolete feature. The original node gets renamed into FFT (Old1) and a conversion utility will replace it in files saved prior to the first version that includes the new node. Did you already push the changes? Is it still in its thread and gets signalled by the |
Changes:
Related forum thread
Hi! First off, this is mostly a proposal, as I'd like more input on implementation specifics on the matter. This PR makes it so that the FFT updates on frame ticks, rather than when the ring buffer fills up. This allows for much greater detail in the lower frequencies, while still maintaining a high update rate. I find this makes more sense for real-time visualisation, as you always want fresh data on render. Additionally, being able to run very large window sizes (8192+) allows for actual pitch tracking in the bass frequencies, which looks really nice for some effects I want.
The current issue is that a low window size of 512 means that every ~6th frame will be skipped. If you halve that size you can get >60FPS, but with even lower bass resolution as well as unnecessary calculations (n*log(n) complexity makes that a non-issue however).
I do understand there are other applications for the node: a drawback is that the same audio data will be processed multiple times due to overlapping windows, which for some use cases is not necessary and the lower processing cost would be preferred. (As I understand the peak/worst-case processing cost is not reduced though?) Therefore, I'm thinking maybe of adding this a second node, leaving the old node with the buffer-synced behaviour. Call it "FFT (frame-based)"? I figure making it non-breaking is important for this matter.
As a side-note, I'm looking at using a different FFT implementation as well as using a purely real FFT for better speed. SharpFFTW (MIT Licensed) looks to be the fastest, although it's not on NuGet and wraps C. Any particular preferences in regards to the question?