Folks any views on using LLMs like Gemma 3 12b 27b for Embeddings ?

Posted by Leather-Departure-38@reddit | LocalLLaMA | View on Reddit | 8 comments

Folks here, I was wondering, if we can use Gemma3 12b (inprinciple we can use) for vectorizing the documents for later search, I know there are opensource embeddings like nomin and all mini. I was just wondering if you guys are open for this discussion? Using embedding models like nomic Vs LLMs like Gemma for embeddings.

[-]

Rukelele_Dixit21@reddit

What are embedding models ? How are they different from LLMs ? LLMs also make embeddings in between ?

[-]

phree_radical@reddit

Take a look at the output logits for a given context, that pretty much will give you a high-level idea of what information is contained in a single embedding. One token's worth of information with just enough context to predict one token

[-]

EmbarrassedKey3002@reddit

how do you see the logits for a given context?

[-]

phree_radical@reddit

Depends what engine you're using, like with pytorch it's just the output of model(), or if it's one of the chatbot engines compatible with OpenAI protocol I believe there's a common way to ask for logits via that

[-]

gmork_13@reddit

Part of the trick is finding the correct layer to tap for embedding, as later layers will steer towards pointing at the next model. Conversely, too early and the model has a simplified understanding of the input.

Take a bunch of texts where some of them share content in some way (so they ought to be embedded close), and some texts that ought to be embedded further apart.

Run them through Gemma and take the output of every 5th layer or so. Sum and average across input length, so you get (1,d) vectors from your sequence.

Use cosine similarity or some other measurement to see if the texts are embedded close or far apart where they’re supposed to. Check what layer output gives you the best clustering for the same content pairs snd disentanglement for the non similar pairs.

Or just use an embedding model.

[-]

Expensive-Paint-9490@reddit

There are embedding models like bge-multilingual-gemma which are based on Gemma 2.

[-]

segmond@reddit

Yes, you can use any model for embedding. The only challenge with using larger model is it takes longer and is slower than say a 1B model. So if you have a huge data set, then you have to wait longer both to embed for storage and to search and find results. The LLM models are smarter than the small specialized embedding models so their quality is better. It's a tradeoff sort of thing.

[-]

Leather-Departure-38@reddit (OP)

Makes sense to me, thanks for your comment.