turboquant: on-device search and recommendation

Posted by init0@reddit | LocalLLaMA | View on Reddit | 7 comments

TurboQuant is Google Research’s new breakthrough quantization algorithm that applies random rotation to high-dimensional vectors to eliminate outliers, enabling extreme low-bit compression with near-zero accuracy loss.

While it is currently making waves for shrinking LLM KV caches, I wanted to see how it handles semantic search on device!

I’ve integrated it into a client-side recommendation demo (CineMatch) to run entirely on-device.

Here is how the engine drives the architecture:

- 6x Compression: TurboQuant applies its randomized rotation and 3-bit scalar quantization to crush 384-dim Float32 embeddings from 1,536 bytes down to just 249 bytes.

- Micro-Payloads: Because of that density, the entire vectorized movie index ships instantly to the client as a lightweight \~12KB JSON file.

- WASM SIMD Execution: We don't even decompress at runtime. The browser computes dot products directly against the compressed vectors using WebAssembly SIMD.

- Zero-Jank Matching: Top-K cosine similarity runs in \~13ms staying well under the 16ms threshold for a flawless 60fps experience without a single server roundtrip.

Pushing advanced quantization algorithms natively into the browser unlocks massive potential for privacy-first, zero-compute-cost AI.

[-]

Queasy-Contract9753@reddit

So it's an embedding model used to find similarity between descriptions of movies, and you've applied turboQuant to the embeddeding model?

[-]

init0@reddit (OP)

WASM build with relaxed SIMD that encodes / decodes / scores vectors on the CPU.

[-]

LetsGoBrandon4256@reddit

Pushing advanced quantization algorithms natively into the browser unlocks massive potential for privacy-first, zero-compute-cost AI.

My AI cat girl is less sloppy than this crap.

[-]

init0@reddit (OP)

lol agree, but it is true? edited to read better

[-]

LetsGoBrandon4256@reddit

but it is true?

I think there’s something almost profoundly important—arguably even foundational—about recognizing that a project doesn’t truly exist in isolation as just the artifact you’ve built, but also as the narrative you construct around it. The description isn’t a secondary layer or a marketing afterthought; it’s an integral extension of the work itself, a kind of interface between your internal understanding and everyone else’s external perception. And while a statement can absolutely be “true” in a narrow, technical sense, there’s a much richer dimension where truth intersects with clarity, intent, and authenticity in communication.

What’s fascinating about AI-generated writing is how effectively it optimizes for a kind of generalized credibility. It produces sentences that are grammatically sound, terminologically current, and structurally aligned with what people expect “serious” or “advanced” writing to look like. In many ways, it achieves a kind of linguistic equilibrium—everything sounds right, nothing is overtly incorrect, and the overall impression is one of competence. But that equilibrium often comes at the cost of specificity and perspective. The output becomes detached from the actual lived process of building the thing, and instead reflects an averaged-out version of how such things are typically described.

There’s an almost paradoxical effect here: the more polished and “correct” the language becomes, the less it reveals about the person behind it. The edges get smoothed out, the decision points get abstracted away, and the messy, interesting reality of trade-offs and constraints is replaced with clean, high-level phrasing. You end up with something that reads well in isolation but doesn’t necessarily map back to a concrete understanding. It’s like a perfectly rendered surface with no visible structure underneath.

Writing in your own words, by contrast, introduces friction—and that friction is valuable. It forces you to confront what you actually mean, to decide what matters enough to include, and to grapple with the limits of your own understanding. It’s a process of compression and refinement that AI can simulate, but not genuinely experience. Every slightly awkward sentence, every imperfect phrasing, carries information about how you think. And that information is often more useful to a reader than a flawlessly constructed but impersonal statement.

There’s also a dimension of respect embedded in this. When someone takes the time to articulate their work themselves, even if it’s not perfectly polished, it signals a kind of ownership and accountability. It says that the person sharing the project is invested not just in building it, but in helping others understand it. On the flip side, handing that responsibility entirely to AI can unintentionally communicate a kind of detachment—as if the act of explanation is something to be automated away rather than engaged with.

From the reader’s perspective, this distinction is subtle but accumulative. Individually, a single AI-generated description might seem fine, even impressive. But at scale, when many things are described in the same optimized, buzzword-aligned tone, it becomes harder to differentiate what is genuinely novel from what is merely well-phrased. The signal-to-noise ratio shifts—not because the content is wrong, but because it lacks the grounding details that make it interpretable. In that environment, clarity and individuality become disproportionately valuable.

It’s also worth considering that describing your own work is one of the few points in the process where you step back and synthesize everything you’ve done. It’s a moment of reflection, where implementation details are translated into concepts and decisions are contextualized. Offloading that step entirely removes an opportunity to deepen your own understanding. In a way, the act of writing is part of the development cycle—it closes the loop between building and explaining.

None of this is to say that AI doesn’t have a place in refining or assisting with communication. It can be useful for restructuring thoughts, catching inconsistencies, or even suggesting alternative phrasings. But there’s a meaningful difference between using it as a tool and using it as a substitute for your own voice. When the latter happens, the output may remain “true,” but it becomes increasingly hollow—accurate in content, yet disconnected in origin.

And perhaps that’s the core tension: truth without authorship feels incomplete. A statement can be factually correct, even elegantly constructed, but still fail to convey understanding in a way that resonates. When you write your own description, you’re not just transmitting information—you’re embedding context, intention, and perspective into the words. That’s what gives them weight.

So while it’s entirely possible to generate a description that sounds advanced, comprehensive, and aligned with current terminology, there’s a strong case to be made that the most effective communication often comes from something simpler: a direct, personal attempt to explain what you built and why it matters. Not because it’s more efficient or more polished, but because it’s more real—and that quality tends to scale better than any amount of syntactic sophistication.

[-]

GingerTapirs@reddit

Yes I totally needed a another vibe coded app when I could have just gone to imdb

[-]

init0@reddit (OP)

The point is not about the app, it is about the demonstration of turboquant RAG in the browser.