EmbeddingGemma - 300M parameter, state-of-the-art for its size, open embedding model from Google

[-]

johntdavies@reddit

Always good to see new models and this looks pretty good. I see from the comparisons on the model card that it’s not as “good” as Qwen-Embedding-0.6B though. I know Gemma is only half the size but that’s quite a gap. Still, I look forward to trying it out, another embedding model will be very welcome.

[-]

cnmoro@reddit

Just tested It on my custom RAG bench for portuguese and It was really bad :(

[-]

ivoencarnacao@reddit

Do you recommend any embedding model for Portuguese?

[-]

cnmoro@reddit

This one: https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe

Or my distilled version (static model If you need speed over quality): https://huggingface.co/cnmoro/nomic-embed-text-v2-moe-distilled-high-quality

[-]

ObjectiveOctopus2@reddit

Fine tune it for Portuguese

[-]

ivoencarnacao@reddit

Im looking for a embedding model for a RAG project in portuguese, better than all-MiniLM-L12-v2, that is the way to go, but i think its too soon!

[-]

secsilm@reddit

the google blog says "it offers customizable output dimensions (from 768 to 128 via matryoshka representation )", interesting, variable dimensions, first time hearing about it.

[-]

Common_Network@reddit

bruh MRL has been out for the longest time, even nomic embed supports it

[-]

secsilm@reddit

never used it, in your opinion, is it better than normal fixed dimension?

[-]

maglat@reddit

nomic-embed-text:v1.5 or this one? which one to use?

[-]

Common_Network@reddit

based on the charts alone, gemma is better

[-]

sanjuromack@reddit

Depends on what you need it for. Nomic is really performant, the context length is 4X longer, and has image support via nomic-embed-vision:v1.5.

[-]

curiousily_@reddit (OP)

Too new to tell, my friend.

[-]

Away_Expression_3713@reddit

What do actually people use embedding models for? like i knew the applications but how does it purposely help w it

[-]

igorwarzocha@reddit

apart from obvious search engines, you can put it inbetween a bigger model and your database as a helper model. a few coding apps have this functionality. unsure if this actually helps or confuses the LLM even more.

I tried using it as a "matcher" for description vs keywords (or the other way round, cant remember) to match an image from generic assets library to the entry, without having to do it manually. It kinda worked but I went with bespoke generated imagery instead :>

[-]

horsethebandthemovie@reddit

which programming apps do you know use this kind of thing? been interested in trying something similar but haven't had the time, always hard to tell what $(random agent cli) is actually doing

[-]

igorwarzocha@reddit

Yeah, they do it, but... I would recommend against it.

AI generated code moves too fast, you NEED TO re-embed every file after every write tool. And LLM would need receive an update from the DB every time it wants to read a file.

People can think whatever they want, but I see it as context rot and source of potentially many issues and slowdowns. it's mostly marketing AI bro hype when you logically analyse this against current. limitations of llms. (I believe I saw Boris from Anthropic corroborating this somewhere, while explaining why CC is relatively simple)

Last time I remember trying a feature like this, it was in Roo I believe. Pretty sure this is also what cursor does behind the scenes?

You could try Graphiti MCP or the simplest and the best idea... Code a small script that creates and .md codebase with your directory tree and file names. @ it at the beginning of your sesh, and rerun & @ again when the ai starts being dumb.

Hope this helps. I would avoid getting too complex with all of it.

[-]

aeroumbria@reddit

Train diffusion models on generic text features as conditioning

[-]

plurch@reddit

Currently using embeddings for repo search here. That way you can get relevant results if the query is semantically similar rather than only rely on keyword matching.

[-]

sammcj@reddit

That's a neat tool! Is it open source? I'd love to have a hack on it.

[-]

plurch@reddit

Thanks! It is not currently open source though.

[-]

Former-Ad-5757@reddit

For me it is a huge filter method between database and llm.
In my database I can have 50.000 classifications for products, I can't feed an llm that kind of size.
I use embeddings to get like 500 somewhat like classifications and then I let the llm go over the 500.

[-]

Consistent-Donut-534@reddit

Search and retrieval, also for when you have another model that you want to condition on text values. Easier to just use a frozen off the shelf embedding model and train your model around that.

[-]

ChankiPandey@reddit

recommendations

[-]

-Cubie-@reddit

Mostly semantic search/information retrieval

[-]

a_slay_nub@reddit

It's smaller but it seems a fair bit worse than qwen 3 0.6b embedding

[-]

SkyFeistyLlama8@reddit

How about compared to IBM Granite 278m?

[-]

ObjectiveOctopus2@reddit

It’s a lot better then that one

[-]

ObjectiveOctopus2@reddit

You could also say it’s almost as good at half the size

[-]

IntoYourBrain@reddit

I'm new to all this. Trying to learn about local AI and stuff. What would the use case for something like this be?

[-]

ObjectiveOctopus2@reddit

Long term memory

[-]

danielhanchen@reddit

I combined all Q4_0, Q8_0 and BF16 quants into 1 folder if that's easier for people! https://huggingface.co/unsloth/embeddinggemma-300m-GGUF

We'll also make some cool RAG finetuning + normal RAG notebooks if anyways interested over the next couple of days!

[-]

Optimalutopic@reddit

You can even plug the model in here and enjoy local perplexity, vibe podcasting and much more than that, it has fastapi, MCP and python support: https://github.com/SPThole/CoexistAI

[-]

steezy13312@reddit

Are the q4_0 and q8_0 versions you have here the qat versions?

[-]

danielhanchen@reddit

Oh yes so BF16, F32 is the original not QAT one. Q8_0 = Q8_0 QAT one Q4_0 = Q4_0 QAT one

We thought it's better to just put them all into 1 repo rather than 3 separate ones!

[-]

steezy13312@reddit

Thanks - that makes sense to me for sure.

[-]

V0dros@reddit

Thank you kind sir

[-]

NoPresentation7366@reddit

Thank you so much for being so passionated, you're super fast 😎💗

[-]

ValenciaTangerine@reddit

I was just looking to GGUF It. Thank you!

[-]

Present-Ad-8531@reddit

please explain license

[-]

Beestinge@reddit

How does it compare to BERT? That is also embedding only.

[-]

cristoper@reddit

It is a Sentence Transformer model, which is basically BERT for sentences.

[-]

Icy_Foundation3534@reddit

Is this license permissive? Can I use it to build an app i’m selling?

[-]

CheatCodesOfLife@reddit

If you're not going to read their (very restrictive) license, just use this one man Qwen/Qwen3-Embedding-0.6B.

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

TeeRKee@reddit

WOW

[-]

ResponsibleTruck4717@reddit

I hope they will release it for ollama as well.

[-]

blackhawk74@reddit

Already released:

https://ollama.com/library/embeddinggemma https://github.com/ollama/ollama/releases/tag/v0.11.10

[-]

Plato79x@reddit

How do you use this with ollama? Not with just `ollama run embeddinggemma` I believe...

[-]

agntdrake@reddit

curl localhost:11434/api/embed -d '{"model": "embeddinggemma", "input": "hello there"}'

[-]

agntdrake@reddit

We made the bf16 weights the default, but the q4_0 and q8_0 QAT weights are called `embeddinggemma:300m-qat-q4_0` and `embeddinggemma:300m-qat-q8_0`.

[-]

ResponsibleTruck4717@reddit

Thanks :)

[-]

-Cubie-@reddit

There's comparison evaluations here: https://huggingface.co/blog/embeddinggemma

Here's the English scores, the Multilingual ones are in the blogpost (I can only add 1 attachment)

[-]

JEs4@reddit

Looks like I know what I’m doing this weekend.

[-]

DAlmighty@reddit

It’s interesting that they left Qwen 3 embedding out of that chart.

[-]

the__storm@reddit

Qwen3's smallest embedding model is 600M (but it is better on the published benchmarks): https://developers.googleblog.com/en/introducing-embeddinggemma/

https://github.com/QwenLM/Qwen3-Embedding

[-]

curiousily_@reddit (OP)

In their Training Dataset section, they say:

Code and Technical Documents: Exposing the model to code and technical documentation helps it learn the structure and patterns of programming languages and specialized scientific content, which improves its understanding of code and technical questions.