Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threaded & render-synced FFT #34

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

grufkork
Copy link

@grufkork grufkork commented Feb 24, 2025

Changes:

  • The CalcFFT function is called on render from VVVV, instead of from the audio pipeline when the circle buffer loops around
  • The FFTOut property is replaced by a function
  • Max window size is increased to 8192 - although performance is getting iffy around here, ~2ms+
  • The only change in the Audio.vl patch is that the getter for FFTOut is replaced with a method call. The FFT node is otherwise unchanged and backwards compatible

Related forum thread

Hi! First off, this is mostly a proposal, as I'd like more input on implementation specifics on the matter. This PR makes it so that the FFT updates on frame ticks, rather than when the ring buffer fills up. This allows for much greater detail in the lower frequencies, while still maintaining a high update rate. I find this makes more sense for real-time visualisation, as you always want fresh data on render. Additionally, being able to run very large window sizes (8192+) allows for actual pitch tracking in the bass frequencies, which looks really nice for some effects I want.

The current issue is that a low window size of 512 means that every ~6th frame will be skipped. If you halve that size you can get >60FPS, but with even lower bass resolution as well as unnecessary calculations (n*log(n) complexity makes that a non-issue however).

I do understand there are other applications for the node: a drawback is that the same audio data will be processed multiple times due to overlapping windows, which for some use cases is not necessary and the lower processing cost would be preferred. (As I understand the peak/worst-case processing cost is not reduced though?) Therefore, I'm thinking maybe of adding this a second node, leaving the old node with the buffer-synced behaviour. Call it "FFT (frame-based)"? I figure making it non-breaking is important for this matter.

As a side-note, I'm looking at using a different FFT implementation as well as using a purely real FFT for better speed. SharpFFTW (MIT Licensed) looks to be the fastest, although it's not on NuGet and wraps C. Any particular preferences in regards to the question?

@tebjan
Copy link
Member

tebjan commented Feb 25, 2025

Looks good so far, just a few little things. The FFT calculation should be done on the audio thread to avoid race conditions when the audio card writes samples into the circular buffer while the FFT is calculated.

It also helps to spread performance between CPU cores (which we have enough of, these days), because the vvvv main loop is usually single-threaded and if the FFT is on the render thread, you reduce the performance of the CPU core that renders the graphics and logic.

And the audio thread knows when there is any new audio information available. But you could also think of a dedicated thread that only does FFT but then you need to add sync locks around the buffer copy between threads to avoid these race conditions mentioned above.

@grufkork
Copy link
Author

grufkork commented Feb 25, 2025

Right, I see. Could you give a quick rundown on how the audio pipeline works? I'm guessing running FFT in the audio thread would block it for a while. The 8192-window took about 2ms to execute on my laptop (which was in low-power mode at the time).

Mutlithreading it would definitely be useful. I'm considering how to sync that with the main thread, as to not do too much work but still always have a new frame available. I figure a one-frame delay is workable. If the main thread triggers a new processing round on the FFT thread, the result of which is then made available on the next frame, that would be acceptable. I'm not familiar with how node lifetimes work behind-the-scenes, can you just create the thread on node instantiation and dispose it when the node is removed?

In regards to thread sync and in interest of not blocking the audio thread, I assume copying the buffer when starting the FFT processing step is the way to go. Would it be sensible to have a flow as such:

  • Main thread triggers FFT thread
  • FFT thread requests data from buffer/audio thread
  • At the end of the next Write in the CircularBuffer (these should happen many times per frame, in chunks of ~50 samples?), the audio thread sends a copy of the buffer to the FFT thread, triggering it to begin processing
  • FFT thread makes its latest result available to the main thread through a lock/mutex

That was a lot of questions! Thank you for your help 😊

edit: I have not used multithreading/reactive within vvvv, so I have no idea if it could be solved in the VL rather than the C#.

edit2: I realise I way complicated this... Working on an update right now. Still not sure how to correctly implement create/dispose events though, looking into it,

@grufkork
Copy link
Author

There we go, I think this should be a valid implementation. The FFT node uses just a couple of microseconds even at absurd FFT sizes, and it looks like the thread is properly disposed. I haven't very thoroughly tested all the ways vvvv can exit, so please weigh in if you have any insights on the details. I also removed the ReadWindowed function and did the window processing in the FFT thread to minimise processing done while in a lock (the windowed read behaviour was only used by the FFT anyway).

All access to the circular buffer is now locked. I think the behaviour should not be mirrored in the CircularPullBuffer for the time being if it's already working fine as it is now. Maybe a mark should made so any divergence between the classes in the future is avoided.

The VL diff is only changing the function used to get the FFT.

@grufkork grufkork marked this pull request as ready for review February 25, 2025 14:25
@grufkork grufkork changed the title Render-synced FFT Threaded & render-synced FFT Feb 25, 2025
@tebjan
Copy link
Member

tebjan commented Feb 25, 2025

Nice, looks pretty good. But the frame latency you mentioned is definitely not wanted. It is already difficult to get audio-responsive graphics for live input as it is, because you have audio input latency + graphics output latency. Adding another frame is not desired. So the processing trigger should come from the audio thread, whenever new data is there. This is the moment when a buffer is written into the circular buffer in public void Write(float[] data, int offset, int count). So maybe at the end of this method.

I also am not sure about all the locks and how they would affect existing implementations. Maybe it would be better to make a new ThreadSafeCircularBuffer to avoid overhead for other usages of the circular buffer where no threading lock is necessary.

@sebllll
Copy link

sebllll commented Feb 25, 2025

maybe consider handing the data as an observable?
instead of locking mechanisms, the situation would be handled by HoldLatest.
that would additionally offer the user to add further audio-analysis-logic in that audio-thread by using a foreach (reactive).

@grufkork
Copy link
Author

CircularBuffer is not used by anything except the FFT module at the moment, in fact only the ReadDouble method is ever referenced. (The Read function could also be improved by doing two array copies rather than element-wise copy) CircularPullBuffer is used a lot more it seems. Maybe the plain CircularBuffer could just be renamed?

I'm not sure how I'm supposed to implement it on the audio thread without doing excessive processing (assuming it's still run in a separate thread). Something has to trigger the whole copy, process, present thread. Ideally, if the the FFT takes n μs, you'd start the processing n μs before the next frame update so you have the absolutely freshest data ready. You could run it continuously which would almost saturate a CPU core, but for a 2ms FFT this means a latency between 0-2ms, average 1ms. You could potentially do some kind of heuristic for when to start the FFT, but if you underestimate the time you'll worst-case have 2 frames latency/frame skips if you miss it. I do agree that (@60fps) ~17ms extra latency is pretty bad - just by itself it's getting close to what humans can differentiate. Is there something obvious I'm misunderstanding in how it's supposed to be synced?

I did consider if you could all of this in the VL, but I have not actually done any asynchronous programming within vvvv yet. It would allow for greater flexibility, on the other hand my non-qualified guess is that a couple of locks (which AFAIK are basically free, as long as you're not waiting for them which only happens for as long as it takes to copy an array) is faster than vvvv's async methods - though presumably they are not particularly heavy either since it's all C# in the end. To me it does seem better to solidly implement such performance critical things right away in C# but please fill me in on the subject!

@sebllll
Copy link

sebllll commented Feb 26, 2025

i think you misunderstood my proposal. let me try to explain better.

in this branch i modified your code in a minimal way to return the fft data as observable.
https://github.com/sebllll/VL.Audio/tree/FFT-reactive

this is how that looks like implemented in vvvv:
image

as you can see, we have even less running in the mainloop (just the Trigger), need to care less about syncing and have the ability to do additional processing in the analysis thread.

i also changed the helppatch in that branch, so you could have a look.

@grufkork
Copy link
Author

Hm, I see. I realise now that Reactive is a C# feature (Rx), not just a vvvv concept! I think I understand it a bit better now. How is the performance of using Reactive compared to just simple locks? I figure Reactive has more overhead since it's a more extensive/complex system, but I have no fathom of what kind of scale we are talking. It could be worth it for being able to easily run additional logic inside vvvv though. Maybe a separate FFT (Reactive) node could be added?

As I understand this does not however solve the frame-delay issue, am I right? I figure it might be ok in the meantime regardless, as the current implementation has except for anything <=256 samples already a higher latency + loads of jitter depending on how the FFT/render frames line up. I'll try adding a toggle for adding a trigger that runs on-new-samples-available, which would provide the theoretically lowest latency at the cost of CPU load.

@tebjan
Copy link
Member

tebjan commented Feb 28, 2025

First, I want to say thank you for improving the FFT situation!

Here is an idea for how the frame delay issue could be solved:
You want to trigger the FFT calculation as soon as possible but not faster than it can be calculated (when the sizes are huge).
So you can have a simple audio sample counter that knows how much audio time has progressed. Audio time is usually exact as you know the sample rate. Whenever the given time is up, the audio thread triggers the FFT thread. If the thread is idle, it will start; if it is busy, it will start right after it has finished.

The sample counter could be at the end of the Write method of the circular buffer, where it waits for audio buffers and counts the incoming samples.

Regarding the new thread lock modifications of the CircularBuffer, I'd separate that, even though the current version is not in use anywhere else. The existing one might become handy for other audio transformations or in case we keep the current state of FFT as a legacy node. Then it needs to have the exact same code paths as it has right now.

@grufkork
Copy link
Author

grufkork commented Mar 1, 2025

Actually coded up just that a today but didn't push :P

The new commit makes it so the FFT node has a pin which chooses whether to run frame-synced, which is CPU-optimal but has a 1-frame delay, or to run as quickly as possible as long as new data has arrived. For the second mode, the trigger is set every time new data is written to the buffer. This means if the calculation is already running, it will just be primed to run again as soon as it's ready. Being able to toggle it with a pin is nice for debug/release-type work.

There's definitely a question in there how to handle the change, as to not make it breaking. You can add new pins without breaking, but there's still the fact that the behaviour has changed. I see for most projects it would just be an improvement, but there might be a just a few that rely on the specific way the old FFT worked - is this significant enough to warrant keeping the old FFT node in, without a name change? I think moving forward is the better option - old projects probably run on older versions of vvvv regardless.

As for reactive or not: I think one reason to not make the standard FFT reactive, is that for new users the FFT should "just work" without having to fiddle with additional nodes. I'm all for adding a separate reactive version of it though.

@tebjan
Copy link
Member

tebjan commented Mar 7, 2025

Great. For legacy nodes, we have the Obsolete feature. The original node gets renamed into FFT (Old1) and a conversion utility will replace it in files saved prior to the first version that includes the new node.
Another way would be to make a new FFT (Threaded) node with the new code paths.

Did you already push the changes? Is it still in its thread and gets signalled by the Write method so that it doesn't block the audio thread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants