In my opinion the answer for the question is: not yet, as far as current general purpose(non-real time) operating systems and their APIs are concerned. Most of them appear to over do the idea of playing back of a sound buffer. Rightly so, since even the fastest CPUs will hiccup on playback if playback is done with a bad design. But as time progresses single thread performance has risen and multi-thread performance tends to be better if done correctly but there are too many ways to acutally do it.
We have always considered multi-threaded designs superior for audio playback and processing for the mere reason of we just can do more in a given amount of time. Simple playback as been fluid for years on CPUs that are generally available, even on phones and other small devices. Playback with live DSP is a whole different problem, Vrok(older versions) tried to answer it with a 2 thread model with a decoder thread and a playback thread. DSP is done on the decoder thread; the design had little reason for doing so. The only reason was why not! It worked well, but as the CPUs that it was run on got weaker the buffer had to be made bigger to reduce the artifacts. It added a notable amount of latency to the audio.
Even with years of DSP on audio, almost everyone in the audio business has been doing it linearly: one effect after another. There’s very less reason for this choice, right now I can’t think of one other than the ease of implementation. Non-linear DSP should be great! Atleast in theory. However, this is not the first time that someone thought of it, JACK(JACK Audio Connection Kit) offers a great implementation on supported operating systems. JACK is a bulky system and is recommended for people with some knowledge in sound servers and patch bays, it is a no go for someone who is only going to listen to some music in their free time. Therefore why not a mini-JACK like system? Which is faster, portable and extendable.
Communication between threads are done by two queues which are built to work fast (preferably lock-free). Memory allocation and deallocation could be done for buffer management but there’s two things that’ll add some overhead, every allocation and deallocation would need to be synced and a big amount of allocations and deallocations for long periods of time will have an adverse affect on the allocator because of segmentation (you may think of this as over engineering but if run on a system with low resources these might be significant). Preallocating everything before you use it and reuseing what you’ve allocated is always the good approach. Therefore, I chose to implement so.
If every node is run on a different thread, there will be a lot of threads at sometime and what’s slowing things down would be their own contention. On an N core system, if HyperThreading is enabled there is no way that you are going to get more than 2N threads running at the sametime. Therefore it’s useful to have a mini scheduler that will schedule work of every node without the overhead of a lot of threads. Each thread will run a preset amount of processing functions. This is no way a finalized design, however it has the potential to make use of whatever hardware there is to its maximum without forcing the DSP or the playback to be of low quality.
The implementation so far,
Working Binaries (Windows & Linux versions work similar to shells, more information on the repo README)