How I use Gemma 3 to help me reply my texts
Posted by sean01-eth@reddit | LocalLLaMA | View on Reddit | 28 comments
Ever since there're code completions, I wish I could have something similar when texting people. Now there's finally a decent method for that.
The app works on any endpoint that's OpenAI compatible. Once you set it up, it gives you texting completions right inside WhatsApp, Signal, and some other texting apps.
I tested it with Gemma 3 4B running on my AMD Ryzen 4700u laptop. The results come out slow, but the quality is totally acceptable (the video is trimmed, but the suggestions come from Gemma 3 4B). I can imagine if you have a powerful setup, you can get these texting suggestions with a fully local setup!
Here's a brief guide to make this work with ollama:
- Download the app from GitHub: https://github.com/coreply/coreply
- Download
gemma3:4b-it-qat
in ollama - Set environment variable
OLLAMA_HOST
to0.0.0.0
on the computer running ollama and restart ollama - In the Coreply app, set the API URL to
http://192.168.xxx.xxx:11434/v1/
(replace192.168.xxx.xxx
with the IP address of the ollama machine), Model namegemma3:4b-it-qat
- Grant permissions and turn on the app. Enjoy your texting suggestions!
My laptop isn't powerful enough, so for daily use, I use Gemini 2.0 Flash, just change the URL, API Key, and model name.
Let me know how's your experience with it!
Evening_Ad6637@reddit
Just fyi: Apple has this built-in and it’s local, but it suggests maximum three/four words as of now
sean01-eth@reddit (OP)
I saw android is getting something similar soon!
Commercial-Celery769@reddit
Please android or someone figure out how to use NPU's for mobile LLM'S it would be so much quicker and more efficient.
Evening_Ad6637@reddit
Well that’s what Apple is doing. And I assume androids will behave the same once they have it built-in
any41@reddit
Soon?, its been in samsung devices since the launch of s25 series
R46H4V@reddit
Wouldn't Gemma 3 1B QAT also be sufficient for this and way faster than 4B?
sean01-eth@reddit (OP)
Tried 1B but it seems very bad at this
webshield-in@reddit
Try Gemma 3n. It's available through mediapipe API on Android. No need to run the model on laptop.
Free-Cabinet6814@reddit
I which way you read the messages? Through ocr? How did you parsed the messages from the person?
sean01-eth@reddit (OP)
It's using Android's accessibility API
ROOFisonFIRE_usa@reddit
Could this work in all apps not just messaging apps?
sean01-eth@reddit (OP)
Technically it's possible by feeding the on screen content and asking the model "what's next?", but this version haven't support this yet.
ROOFisonFIRE_usa@reddit
Would really like this feature. If I can help make this happen let me know how I can help.
tempetemplar@reddit
This is cool!
lucke2999@reddit
Hey! It's very cool thanks :)
My phone doesn't allow me to enable the accessibility service for the app, it's grayed out, would you happen to know how to change that?
sean01-eth@reddit (OP)
Restricted settings? Go to the app info of coreply, tap the three dots at the top right corner, click "allow restricted settings". If you can't find the three dots, it should be somewhere else in the same app info screen
Pvt_Twinkietoes@reddit
Hmm. Have you tried using a base model? I wonder if it'll give better responses since they're trained to do auto completion.
leuchtetgruen@reddit
Nice but the suggestion it gave for me in German were useless unfortunately
sean01-eth@reddit (OP)
From my experience Gemma 3 12B gives way better suggestions but unfortunately it's too big for most setups.
dugavo@reddit
Have you tried Qwen3-30B-A3B? If it's possible to use chain-of-thought models. Of course with /no_think
It should be much faster than Gemma3 12B, but very high quality.
sean01-eth@reddit (OP)
Just briefly tried, it seems like my 16gigs of RAM struggles to keep the entire model in memory. But looks like it's a way to improve performance with limited computational power.
dugavo@reddit
Yup, it runs fine on my PC with 32GB of memory but it can eat 28GB+ (4-bit quant)
ChristopherRoberto@reddit
I'd be concerned with using LLMs for autocomplete would wind up getting me flagged as a bot when they start trying to fight back against bots.
Particular_Flow_8522@reddit
My play protect blocks the install.
JawGBoi@reddit
Very cool concept. However, the predictions I get from gemma 3 4b aren't the best. Sometimes they're okay but often it misunderstands who said what, or what the conversation is about.
I think you could probably improve reliability through better prompting and prompting techniques.
sean01-eth@reddit (OP)
Ya I also found it struggles to identify which messages are SENT and which were RECEIVED.
phhusson@reddit
Cool, just need to add media-pipe in the app to get on-device gemma 3n and make it edge-llama
jamaalwakamaal@reddit
I tried this with MNN server, then also with Ollama running locally on phone. Had tried this few months back, didn't work then, doesn't work now. The server logs show POST GET but that's it. No text ever comes out. Waste of time.