EmbeddingGemma - 300M parameter, state-of-the-art for its size, open embedding model from Google
Posted by curiousily_@reddit | LocalLLaMA | View on Reddit | 66 comments
Weights on HuggingFace: https://huggingface.co/google/embeddinggemma-300m
johntdavies@reddit
Always good to see new models and this looks pretty good. I see from the comparisons on the model card that it’s not as “good” as Qwen-Embedding-0.6B though. I know Gemma is only half the size but that’s quite a gap. Still, I look forward to trying it out, another embedding model will be very welcome.
cnmoro@reddit
Just tested It on my custom RAG bench for portuguese and It was really bad :(
ivoencarnacao@reddit
Do you recommend any embedding model for Portuguese?
cnmoro@reddit
This one: https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe
Or my distilled version (static model If you need speed over quality): https://huggingface.co/cnmoro/nomic-embed-text-v2-moe-distilled-high-quality
ObjectiveOctopus2@reddit
Fine tune it for Portuguese
ivoencarnacao@reddit
Im looking for a embedding model for a RAG project in portuguese, better than all-MiniLM-L12-v2, that is the way to go, but i think its too soon!
secsilm@reddit
the google blog says "it offers customizable output dimensions (from 768 to 128 via matryoshka representation )", interesting, variable dimensions, first time hearing about it.
Common_Network@reddit
bruh MRL has been out for the longest time, even nomic embed supports it
secsilm@reddit
never used it, in your opinion, is it better than normal fixed dimension?
maglat@reddit
nomic-embed-text:v1.5 or this one? which one to use?
Common_Network@reddit
based on the charts alone, gemma is better
sanjuromack@reddit
Depends on what you need it for. Nomic is really performant, the context length is 4X longer, and has image support via nomic-embed-vision:v1.5.
curiousily_@reddit (OP)
Too new to tell, my friend.
Away_Expression_3713@reddit
What do actually people use embedding models for? like i knew the applications but how does it purposely help w it
igorwarzocha@reddit
apart from obvious search engines, you can put it inbetween a bigger model and your database as a helper model. a few coding apps have this functionality. unsure if this actually helps or confuses the LLM even more.
I tried using it as a "matcher" for description vs keywords (or the other way round, cant remember) to match an image from generic assets library to the entry, without having to do it manually. It kinda worked but I went with bespoke generated imagery instead :>
horsethebandthemovie@reddit
which programming apps do you know use this kind of thing? been interested in trying something similar but haven't had the time, always hard to tell what $(random agent cli) is actually doing
igorwarzocha@reddit
Yeah, they do it, but... I would recommend against it.
AI generated code moves too fast, you NEED TO re-embed every file after every write tool. And LLM would need receive an update from the DB every time it wants to read a file.
People can think whatever they want, but I see it as context rot and source of potentially many issues and slowdowns. it's mostly marketing AI bro hype when you logically analyse this against current. limitations of llms. (I believe I saw Boris from Anthropic corroborating this somewhere, while explaining why CC is relatively simple)
Last time I remember trying a feature like this, it was in Roo I believe. Pretty sure this is also what cursor does behind the scenes?
You could try Graphiti MCP or the simplest and the best idea... Code a small script that creates and .md codebase with your directory tree and file names. @ it at the beginning of your sesh, and rerun & @ again when the ai starts being dumb.
Hope this helps. I would avoid getting too complex with all of it.
aeroumbria@reddit
Train diffusion models on generic text features as conditioning
plurch@reddit
Currently using embeddings for repo search here. That way you can get relevant results if the query is semantically similar rather than only rely on keyword matching.
sammcj@reddit
That's a neat tool! Is it open source? I'd love to have a hack on it.
plurch@reddit
Thanks! It is not currently open source though.
Former-Ad-5757@reddit
For me it is a huge filter method between database and llm.
In my database I can have 50.000 classifications for products, I can't feed an llm that kind of size.
I use embeddings to get like 500 somewhat like classifications and then I let the llm go over the 500.
Consistent-Donut-534@reddit
Search and retrieval, also for when you have another model that you want to condition on text values. Easier to just use a frozen off the shelf embedding model and train your model around that.
ChankiPandey@reddit
recommendations
-Cubie-@reddit
Mostly semantic search/information retrieval
a_slay_nub@reddit
It's smaller but it seems a fair bit worse than qwen 3 0.6b embedding
SkyFeistyLlama8@reddit
How about compared to IBM Granite 278m?
ObjectiveOctopus2@reddit
It’s a lot better then that one
ObjectiveOctopus2@reddit
You could also say it’s almost as good at half the size
IntoYourBrain@reddit
I'm new to all this. Trying to learn about local AI and stuff. What would the use case for something like this be?
ObjectiveOctopus2@reddit
Long term memory
danielhanchen@reddit
I combined all Q4_0, Q8_0 and BF16 quants into 1 folder if that's easier for people! https://huggingface.co/unsloth/embeddinggemma-300m-GGUF
We'll also make some cool RAG finetuning + normal RAG notebooks if anyways interested over the next couple of days!
Optimalutopic@reddit
You can even plug the model in here and enjoy local perplexity, vibe podcasting and much more than that, it has fastapi, MCP and python support: https://github.com/SPThole/CoexistAI
steezy13312@reddit
Are the q4_0 and q8_0 versions you have here the qat versions?
danielhanchen@reddit
Oh yes so BF16, F32 is the original not QAT one. Q8_0 = Q8_0 QAT one Q4_0 = Q4_0 QAT one
We thought it's better to just put them all into 1 repo rather than 3 separate ones!
steezy13312@reddit
Thanks - that makes sense to me for sure.
V0dros@reddit
Thank you kind sir
NoPresentation7366@reddit
Thank you so much for being so passionated, you're super fast 😎💗
ValenciaTangerine@reddit
I was just looking to GGUF It. Thank you!
Present-Ad-8531@reddit
please explain license
Beestinge@reddit
How does it compare to BERT? That is also embedding only.
cristoper@reddit
It is a Sentence Transformer model, which is basically BERT for sentences.
Icy_Foundation3534@reddit
Is this license permissive? Can I use it to build an app i’m selling?
CheatCodesOfLife@reddit
If you're not going to read their (very restrictive) license, just use this one man Qwen/Qwen3-Embedding-0.6B.
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
TeeRKee@reddit
WOW
ResponsibleTruck4717@reddit
I hope they will release it for ollama as well.
blackhawk74@reddit
Already released:
https://ollama.com/library/embeddinggemma https://github.com/ollama/ollama/releases/tag/v0.11.10
Plato79x@reddit
How do you use this with ollama? Not with just `ollama run embeddinggemma` I believe...
agntdrake@reddit
curl localhost:11434/api/embed -d '{"model": "embeddinggemma", "input": "hello there"}'
agntdrake@reddit
We made the bf16 weights the default, but the q4_0 and q8_0 QAT weights are called `embeddinggemma:300m-qat-q4_0` and `embeddinggemma:300m-qat-q8_0`.
ResponsibleTruck4717@reddit
Thanks :)
-Cubie-@reddit
There's comparison evaluations here: https://huggingface.co/blog/embeddinggemma
Here's the English scores, the Multilingual ones are in the blogpost (I can only add 1 attachment)
JEs4@reddit
Looks like I know what I’m doing this weekend.
DAlmighty@reddit
It’s interesting that they left Qwen 3 embedding out of that chart.
the__storm@reddit
Qwen3's smallest embedding model is 600M (but it is better on the published benchmarks): https://developers.googleblog.com/en/introducing-embeddinggemma/
https://github.com/QwenLM/Qwen3-Embedding
DAlmighty@reddit
Yeah I edited my post right before I saw this.
-Cubie-@reddit
The blogpost by Google themselves does have Qwen3 in their Multilingual figure: https://developers.googleblog.com/en/introducing-embeddinggemma/
TechySpecky@reddit
What benchmarks do you guys use to compare embedding quality on specific domains?
-Cubie-@reddit
https://huggingface.co/spaces/mteb/leaderboard is the go-to
TechySpecky@reddit
I wonder if it's worth fine tuning these. I need one for RAG specifically for archeology documents. I'm using the new Gemini one.
Holiday_Purpose_3166@reddit
Qwen3 4B has been my daily driver for my large codebases since they came out, and is the most performant for size. The 8B starts to drag and there's virtually no difference from the 8B except slower and memory hungry, although bigger Embeddings.
I've been tempting to downgrade to shave memory and increase speed as this model seems to be efficient for its size.
-Cubie-@reddit
Finetuning definitely helps: https://huggingface.co/blog/embeddinggemma#finetuning
TechySpecky@reddit
Oh interesting they fine tune with question / answer pairs? I don't have that I just have 500,000 pages of papers / books. I'll need to think about how to approach that
NoobMLDude@reddit
How well do you think it works for code?
curiousily_@reddit (OP)
In their Training Dataset section, they say: