How I use Gemma 3 to help me reply my texts

Posted by sean01-eth@reddit | LocalLLaMA | View on Reddit | 28 comments

Ever since there're code completions, I wish I could have something similar when texting people. Now there's finally a decent method for that.

The app works on any endpoint that's OpenAI compatible. Once you set it up, it gives you texting completions right inside WhatsApp, Signal, and some other texting apps.

I tested it with Gemma 3 4B running on my AMD Ryzen 4700u laptop. The results come out slow, but the quality is totally acceptable (the video is trimmed, but the suggestions come from Gemma 3 4B). I can imagine if you have a powerful setup, you can get these texting suggestions with a fully local setup!

Here's a brief guide to make this work with ollama:

Download the app from GitHub: https://github.com/coreply/coreply
Download gemma3:4b-it-qat in ollama
Set environment variable OLLAMA_HOST to 0.0.0.0 on the computer running ollama and restart ollama
In the Coreply app, set the API URL to http://192.168.xxx.xxx:11434/v1/(replace 192.168.xxx.xxx with the IP address of the ollama machine), Model name gemma3:4b-it-qat
Grant permissions and turn on the app. Enjoy your texting suggestions!

My laptop isn't powerful enough, so for daily use, I use Gemini 2.0 Flash, just change the URL, API Key, and model name.

Let me know how's your experience with it!

[-]

Evening_Ad6637@reddit

Just fyi: Apple has this built-in and it’s local, but it suggests maximum three/four words as of now

[-]

sean01-eth@reddit (OP)

I saw android is getting something similar soon!

[-]

Commercial-Celery769@reddit

Please android or someone figure out how to use NPU's for mobile LLM'S it would be so much quicker and more efficient.

[-]

Evening_Ad6637@reddit

Well that’s what Apple is doing. And I assume androids will behave the same once they have it built-in

[-]

any41@reddit

Soon?, its been in samsung devices since the launch of s25 series

[-]

R46H4V@reddit

Wouldn't Gemma 3 1B QAT also be sufficient for this and way faster than 4B?

[-]

sean01-eth@reddit (OP)

Tried 1B but it seems very bad at this

[-]

webshield-in@reddit

Try Gemma 3n. It's available through mediapipe API on Android. No need to run the model on laptop.

[-]

Free-Cabinet6814@reddit

I which way you read the messages? Through ocr? How did you parsed the messages from the person?

[-]

sean01-eth@reddit (OP)

It's using Android's accessibility API

[-]

ROOFisonFIRE_usa@reddit

Could this work in all apps not just messaging apps?

[-]

sean01-eth@reddit (OP)

Technically it's possible by feeding the on screen content and asking the model "what's next?", but this version haven't support this yet.

[-]

ROOFisonFIRE_usa@reddit

Would really like this feature. If I can help make this happen let me know how I can help.

[-]

tempetemplar@reddit

This is cool!

[-]

lucke2999@reddit

Hey! It's very cool thanks :)

My phone doesn't allow me to enable the accessibility service for the app, it's grayed out, would you happen to know how to change that?

[-]

sean01-eth@reddit (OP)

Restricted settings? Go to the app info of coreply, tap the three dots at the top right corner, click "allow restricted settings". If you can't find the three dots, it should be somewhere else in the same app info screen

[-]

Pvt_Twinkietoes@reddit

Hmm. Have you tried using a base model? I wonder if it'll give better responses since they're trained to do auto completion.

[-]

leuchtetgruen@reddit

Nice but the suggestion it gave for me in German were useless unfortunately

[-]

sean01-eth@reddit (OP)

From my experience Gemma 3 12B gives way better suggestions but unfortunately it's too big for most setups.

[-]

dugavo@reddit

Have you tried Qwen3-30B-A3B? If it's possible to use chain-of-thought models. Of course with /no_think

It should be much faster than Gemma3 12B, but very high quality.

[-]

sean01-eth@reddit (OP)

Just briefly tried, it seems like my 16gigs of RAM struggles to keep the entire model in memory. But looks like it's a way to improve performance with limited computational power.

[-]

dugavo@reddit

Yup, it runs fine on my PC with 32GB of memory but it can eat 28GB+ (4-bit quant)

[-]

ChristopherRoberto@reddit

I'd be concerned with using LLMs for autocomplete would wind up getting me flagged as a bot when they start trying to fight back against bots.

[-]

Particular_Flow_8522@reddit

My play protect blocks the install.

[-]

JawGBoi@reddit

Very cool concept. However, the predictions I get from gemma 3 4b aren't the best. Sometimes they're okay but often it misunderstands who said what, or what the conversation is about.

I think you could probably improve reliability through better prompting and prompting techniques.

[-]

sean01-eth@reddit (OP)

Ya I also found it struggles to identify which messages are SENT and which were RECEIVED.

[-]

phhusson@reddit

Cool, just need to add media-pipe in the app to get on-device gemma 3n and make it edge-llama

[-]

jamaalwakamaal@reddit

I tried this with MNN server, then also with Ollama running locally on phone. Had tried this few months back, didn't work then, doesn't work now. The server logs show POST GET but that's it. No text ever comes out. Waste of time.