Semantic routing and caching doesn't work - task specific LLMs (TLMs) ftw!

[-]

UnreasonableEconomy@reddit

Same issue as embeddings

but anon, many embedding models ARE llms.

You can even prompt some embedding models (post query contextualization).

One issue is that the embedding world has kinda fallen asleep about a year ago and there doesn't seem to be all that much interest (or understanding) for what they can or cannot do.

VLM embedding models, 30b, 70b dense embedding models, or LLMs that can be embedding sampled (like davinci embeddings) is absolutely what we need.

clustering

It's also a non-issue. (or rather, is solvable with occlusion).

Imagine you're making a computer game. Like in Unreal or something.

You can look at your embedding vector as a camera coordinate. You can sort of think of your embedding results as a sort of an n+1 dimensional fisheye perspective of your hypersphere geometry. The problem with clustering is that you're thinking like an n+1 entity.

You need to shave off that extra dimension and "render" your "view" in something that is logical in n dimensions. One way to do that is by creating the equivalent of a z-buffer/occlusion map. Your LLM can then look around in that semantic space and decide if there's anything worthwhile, but there's no sense in giving it a tomographic view of its world (signal to noise ratio).

I know this is kinda abstract, hope this makes some sort of sense. There was a post about this on the OpenAI forums a while ago.

[-]

Not_your_guy_buddy42@reddit

OPs post and your comment are making me want to play with a tree of classifiers and coreference resolution

[-]

Accomplished_Mode170@reddit

Same but can y’all clarify the objection for myself and posterity?

I.e. if we have heuristics validated stepwise across an event-chain we’re testing for values in dimensions absent the models actual latent space.

[-]

Not_your_guy_buddy42@reddit

Yes, semantic-only misses context-dependent meaning. A tree can make routing decisions hierarchically - boring old tree, just semantic. E.g. 'I want a refund' and 'I don't want a refund'. 1. identify the "refund" part 2. check negation (dot product similarity might be enough) 3. ??? 4. profit . Stepwise classification builds up understanding incrementally. Just clustering makes no sense, the post is a socials strawman lol but I still dig OP‘s project

[-]

ShengrenR@reddit

For the 'follow-ups' - this is just a case for prompt pre-processing before you send it along for classification/routing - same as why you wouldn't run RAG on that statement alone directly; you pass the conversation context, ask for rephrase/HyDE/whatever and route/embed on that.

[-]

AdditionalWeb107@reddit (OP)

And that’s the point - you have to build and maintain all that plumbing code or you could use a task-specific LLM that does it in one shot and trained for routing scenarios. Faster, cheaper and effective

[-]

Accomplished_Mode170@reddit

This is the part folks are missing; a distilled task-specific LM doesn’t require UMAPing your JSON

Like, everything with shared context is going to inherently embed the same way.

If we’re going for representative dimensionality in a shared latent space you can also use BERT-style classifiers.

[-]

Accomplished_Mode170@reddit

Ha! Had this opened as a tab all day; came back and was like, ‘Wait that sounds like ArchGW!’

Glad to find y’all here making it accessible!

Folks like yourself, ngrok, Unsloth, et al building AI-native microservices makes me optimistic about the future we’re building.

[-]

Chromix_@reddit

For those who didn't immediately catch what the bullet points in the post are aiming at: OP is making a case for routing (classifying) user requests using a fast, specialized LLM, over traditional, non-LLM-based methods. You can find a diagram of the flow at the beginning of the project page.

By the way: Entertaining example code / prompt:

        role="sales agent",
        instructions=(
            "Always answer in a sentence or less.\n"
            "Follow the following routine with the user:\n"
            "1. Engage\n"
            "2. Quote ridiculous price\n"
            "3. Reveal caveat if user agrees."