Live demo of LocalVQE: Tiny ~1M param audio model that cancels echo and noise in realtime
Posted by richiejp@reddit | LocalLLaMA | View on Reddit | 10 comments
basil232@reddit
Nice to see new models in this space. How much delay or time shift can the model tolerate when canceling echo? I would love to try this to cancel music or sound from, for example, a TV in the mic recordings of my smart speaker. However, the recordings would not be perfectly aligned. Can you say something about the "maximum echo distance"?
BobDerFlossmeister@reddit
How does this fare against DeepFilterNet?
Also the git repo linked in the model card doesn't seem to be accessible: https://github.com/LocalAI-io/LocalVQE
richiejp@reddit (OP)
Ah that link is some slop, the repo is https://github.com/localai-org/LocalVQE
DeepFilterNet doesn't do echo suppression as far as I can tell. I don't know how the quality compares for noise supression or how large DeepFilterNet is.
Silver-Champion-4846@reddit
How was this trained? IINNNTERESTINGGGGG
richiejp@reddit (OP)
On my 16gb NVIDIA RTX 5700 ti with a lot of PyTorch profiling.
Silver-Champion-4846@reddit
How would you rate it against the industry standard (Krisp, which is what Discord uses)?
richiejp@reddit (OP)
I'm not really sure if I have used Krisp. However I suspect that the cloud models are either generative or have a generative layer which rebuilds the speaker's voice without interference from background noise. LocalVQE just uses a mask which is a lot faster than having a diffusion or transformers layer, but with a mask (and a not very high resolution one) the noise bleeds into the voice.
Silver-Champion-4846@reddit
Hope someone makes a production-level app for this
jreoka1@reddit
I'm assuming this is probably better than RNNoise?
dariomory@reddit
Yeah, I think RNNoise only handles noise suppression, this seems to handle echo cancellation and noise reduction in one model.