Anyone using MedGemma 27B?
Posted by DeGreiff@reddit | LocalLLaMA | View on Reddit | 20 comments
I noticed MedGemma 27B is text-only, instruction-tuned (for inference-time compute), while 4B is the multimodal version. Interesting decision by Google.
Suitable_Currency440@reddit
Testing so far. i know its an old thread but here is my two cents.
For consumers in general market.
1. Good for trying to explain easy concepts, it grasp well the knowledge of most basic pathologies.
2. Good for understanding why you need to use your medications and what they are for.
Where it lacks
1. Treatment, i've seen it almost get it right just enough to convince a person who is not on the field about the correct medication, but enough that if the person took it it would cause harm. So i would strongly advise against to follow its instructions on treatments.
1.1. Tested on treatment on diabetes, pneumonia, rheymatology and immunology, it only advise correctly the easiest ones, but posology, the correct way, the correct medication sometimes it gets right, sometimes it gets wrong, so its very crude in these aspects.
2. Lab images: It finds easy to mid difficulty images in perplexity, can lack depth in harder images, ultrassound, ct, mri. As most people don't know when something should be there or shouldnt, i would not advise so much as reliable for this.
How i made it work, for me at least.
As a rag agent, i embedded my own database for medications, posologies, and treatments and it consults mostly to check if it matches with current guidelines.
Next_Land6577@reddit
Bonjour, je suis intéressé par votre approche est-il possible d'en discuter davantage pour comprendre comment vous avez mis ces solutions en place et notamment savoir comment vous maintenez vos données à jour concernant la base de données des médicaments. Hate d'avoir cet échange :)
Suitable_Currency440@reddit
Sure, pm-me. I actually kept improving it, its far easier, more accurate and accesible to build one.
Next_Land6577@reddit
I sent you a private message
RevolutionaryEase613@reddit
hi I was curious as to what tech stack you used to do this. I am having trouble implementing this myself.
Suitable_Currency440@reddit
i'll send you a message as soon as possible to detail it better
RevolutionaryEase613@reddit
Thank you!
just_diegui@reddit
Working on a test on chest RX readings with a neumologist friend.
Using medgemma-27b-it-Q8_0.gguf on HF inference endpoints
bruanfargo@reddit
It supports multimodal models and you can upload images?
MutantEggroll@reddit
There's two variants: medgemma-27b-it is multimodal, medgemma-27b-text-it is text-only.
DeGreiff@reddit (OP)
Great to hear!
bruanfargo@reddit
Could you please link which medgemma model has HF inference endpoints available?
ttkciar@reddit
I recently evaluated MedGemma-27B. It seems very knowledgable and can even extrapolate decently well from the implications of medical studies. Overall I like it.
However, it's oddly reticent to instruct the user to treat injuries or ailments. It's prone to urge the user to contact a doctor, hospital, or EMTs. I would have thought it would be trained to assume it was communicating with a doctor or EMT.
It's possible that I can remedy this with a system prompt telling it to advise a doctor at a hospital, but I haven't tried that yet.
(Yes, Gemma3 supports a system prompt, even though it's not "supposed to". System prompts work very well with it, even.)
padfoot_1024@reddit
Thanks for insights ! ... I Had some followup questions - what was the average inference time and how much vram did it require ? ( I'm assuming you used 8 bit quantization ? )
ttkciar@reddit
Quite welcome :-) incidentally using the system prompt to tell it who it was advising, and in what setting, was very effective! See my follow-up comment here: https://old.reddit.com/r/LocalLLaMA/comments/1lvqtxa/multimodal_medgemma_27b/n28dhno/?context=3
I am using the Q4_K_M GGUF quantization. I am seeing about 11 tokens/second my MI60 (32GB VRAM) using llama.cpp, and 2.5 tokens/second with pure CPU inference on my dual E5-2660v3 system.
Both the MI60 and E5-2660v3 are very old hardware -- the MI60 was released late 2018, and the E5-2660v3 in late 2014. I would expect much faster inference on anything modern.
padfoot_1024@reddit
Thanks for the very detailed response! This was super helpful and helped me scope the hardware requirements :)
I totally agree that the 128K context is not practical- the attention would spread too thin over such long context, defeating the purpose.
My objective is to build a RAG application. Looks like I'll have to subscribe to an A100 GPU card in cloud, as I'm running short on hardware right now 😀
DeGreiff@reddit (OP)
Thanks. Yah, that's odd, replying to users like any other random LLM. I guess Google doesn't want to step on the foot of their healthcare-specific AI tools, like Med-PaLM.
ttkciar@reddit
Following up on this: Using a system prompt of "You are a helpful medical assistant advising a doctor at a hospital." alleviated the model's reticence, caused it to recommend diagnostics and procedures available in a hospital setting, and I think encourages the model to infer more formal terminology as well. It's a win.
In production, the system prompt should probably be tailored to convey more precisely the target audience -- an ambulance EMT, a triage medic in the field, a pharmaceutical researcher, etc. My expectation is that it will give advice suited to the skills and equipment expected of the user and setting, but I will try it and see if that bears out.
andreasbeer1981@reddit
Seems it can only take images, but not .dcm .img or other common medical formats from scans? why?
jaxchang@reddit
Not surprising. Image models tend to be smaller; SD3.0 is what 2 billion params, and flux is 12 billion params? Compare that to Deepseek R1 at 671b params, or Qwen 3 at 235b params, or even Gemma 3 at 27b params. There's just a lot more information in text models that don't exist in images.
How do you draw "he betrayed her trust" as an image, or other abstract concepts, like the chain rule in calculus or a bug in a line of code? You can't.
Anyways, MedGemma is basically exactly what you would expect on the tin. I played around with it for psych theories, and it's not better for that; it won't give you a better rundown of the concepts behind dialectical behavioral therapy, for example. But it IS better at overall summaries, and it knows shorthand like "dx" "fh" "PRN" etc much better. So basically exactly what they advertised.