rhinodevil

Man trains local model to detect and kill mosquitos with a laser

Posted by No_Information9314@reddit | LocalLLaMA | View on Reddit | 48 comments

[-]

rhinodevil@reddit

Thanks. "Mild laser enjoyer" 😄 I was wondering what kind of laser you'd need to burn the mosqitoes with such a visible flame as in the video.

Man trains local model to detect and kill mosquitos with a laser

Posted by No_Information9314@reddit | LocalLLaMA | View on Reddit | 48 comments

[-]

rhinodevil@reddit

What kind of "laser" was used? Is that something you can buy/build yourself??

How to build a shitty robot

Posted by badlogicgames@reddit | LocalLLaMA | View on Reddit | 7 comments

[-]

Seems to me like no RAG, if I understand correctly. I was just asking, because I have a project where I do memory extraction from recent chats and later memory injection via RAG (cosine similarity score & reranking) and I experienced the local LLMs to be limited in understanding that and very sensitive about the prompt structure (smaller LLMs get the injected memories and the actual user prompt confused more often, than larger LLMs, naturally).

How to build a shitty robot

Posted by badlogicgames@reddit | LocalLLaMA | View on Reddit | 7 comments

[-]

rhinodevil@reddit

I was always wondering (no joke) why there is no electronic pet from e.g. Apple, like this [https://www.pleoworld.com/pleo\_rb/eng/products.php](https://www.pleoworld.com/pleo_rb/eng/products.php) , but in good.

How to build a shitty robot

Posted by badlogicgames@reddit | LocalLLaMA | View on Reddit | 7 comments

[-]

rhinodevil@reddit

Very nice! Back in the days (even pre-LLM) I tried to build a robot with a smartphone as its brain with [https://www.kickstarter.com/projects/peterseid/romo-the-smartphone-robot-for-everyone/](https://www.kickstarter.com/projects/peterseid/romo-the-smartphone-robot-for-everyone/) and [https://www.robot-advance.com/EN/art-rovio-1178.htm](https://www.robot-advance.com/EN/art-rovio-1178.htm) . I did not get that far.. About the "boring" software parts: Are you happy with the memory system (if it is already implemented)? What did you use / how does it work?

STT -> LLM -> TTS pipeline

Posted by UniqueIdentifier00@reddit | LocalLLaMA | View on Reddit | 30 comments

[-]

rhinodevil@reddit

Posted an example using Whisper.cpp, llama.cpp and Piper for exactly this here a while ago: [https://www.reddit.com/r/LocalLLaMA/comments/1nj673e/stt\_llm\_tts\_pipeline\_in\_c/](https://www.reddit.com/r/LocalLLaMA/comments/1nj673e/stt_llm_tts_pipeline_in_c/)

Why bother with local LLMs?

Posted by West-Currency-4423@reddit | LocalLLaMA | View on Reddit | 39 comments

[-]

rhinodevil@reddit

More people should use Linux! :-) But I get what you mean.

Audio processing landed in llama-server with Gemma-4

Posted by srigi@reddit | LocalLLaMA | View on Reddit | 72 comments

[-]

rhinodevil@reddit

Problems with the silence did not happen to me.. Maybe a configuration issue? Or using one of the smaller models? Is Parakeet also better than Whisper for other languages than English?

I put a transformer model on a stock Commodore 64

Posted by gizmo64k@reddit | LocalLLaMA | View on Reddit | 31 comments

[-]

rhinodevil@reddit

Also good for the electric bill, because the 6502 consumes the same amount of power, no matter what it is currently doing! ;-)

In terms of Quality, how good is Bonsai 8B?

Posted by AsrielPlay52@reddit | LocalLLaMA | View on Reddit | 3 comments

[-]

rhinodevil@reddit

BTW: It is based on Qwen 3 8b.

Gemma 4 26b A3B is mindblowingly good , if configured right

Posted by cviperr33@reddit | LocalLLaMA | View on Reddit | 371 comments

[-]

rhinodevil@reddit

That's what llama-bench is for: [https://github.com/ggml-org/llama.cpp/tree/master/tools/llama-bench](https://github.com/ggml-org/llama.cpp/tree/master/tools/llama-bench)

Local AI Agent Wake words

Posted by betanu701@reddit | LocalLLaMA | View on Reddit | 9 comments

[-]

rhinodevil@reddit

Maybe this works better? [https://github.com/TaterTotterson/microWakeWord-Trainer-Nvidia-Docker](https://github.com/TaterTotterson/microWakeWord-Trainer-Nvidia-Docker)

Local AI Agent Wake words

Posted by betanu701@reddit | LocalLLaMA | View on Reddit | 9 comments

[-]

rhinodevil@reddit

I also tried this a while ago. Not so easy to get workable results (I didn't).

SOTA Language Models Under 14B?

Posted by No-Mud-1902@reddit | LocalLLaMA | View on Reddit | 25 comments

[-]

rhinodevil@reddit

From my experience, being relatively GPU-poor: With a GeForce Mobile 4060 8 GB, llama.cpp and Windows I am getting about 6 tokens per second with 27B-UD-IQ3\_XXS (from Unsloth). Although 27B does NOT run entirely on that GPU, so RAM and CPU are also playing a part, here! I am expecting it to be a bit faster with Linux. The 9B Qwen 3.5 runs in Linux with 40 to 50 tokens per second, GeForce 3060 6GB, entirely on the GPU (sorry, did not test every combination, have two different GeForce laptops, here).

I just want to catch up on local LLM's after work..

Posted by ForsookComparison@reddit | LocalLLaMA | View on Reddit | 47 comments

[-]

rhinodevil@reddit

Running it on a GeForce Mobile 3060 with 6GB (Q4K\_M). In the 40 to 50 tokens/s range. llama.cpp.

Question: Prompt format for memory injection (local offline AI assistant, 6GB VRAM)?

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 5 comments

[-]

rhinodevil@reddit (OP)

It may be two things here: 1) A smaller model having trouble differentiating between injected data and user input. 2) The smaller model does differentiate between injected memories and current chat context, but it is too difficult for the smaller model to always make the right choice about when to use what injected information in its answer. Qwen 3.5 9B may be smart enough for (1), but might always have trouble with (2). Maybe enabling reasoning/thinking could help there, but then it would take too long to get an answer..

Question: Prompt format for memory injection (local offline AI assistant, 6GB VRAM)?

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 5 comments

[-]

rhinodevil@reddit (OP)

Thanks for you answer. Could you please explain a bit how model2vec would help to get answers in chat that do not interpret the injected memories as part of the chat context?

How do you think a Qwen 72B dense would perform?

Posted by OmarBessa@reddit | LocalLLaMA | View on Reddit | 32 comments

[-]

rhinodevil@reddit

On the low end, at least for my use cases, the dense 9B model performes way better than the 30B and 35B(-A3B) MoE models. The dense 27B model is unfortunately too slow on my consumer-grade hardware. So I guess a 72B dense model would perform much better than 9B and 27B.

What is the best open-source options to create a pipeline like ElevenLab (Speech-to-text, brain LLM and text-to-speech)

Posted by frequiem11@reddit | LocalLLaMA | View on Reddit | 4 comments

[-]

rhinodevil@reddit

I built a little sample pipeline for this with my "wrapper" libraries based on Whisper.cpp, llama.cpp and Piper, in C/C++, here: [https://github.com/RhinoDevel/mt\_llm/tree/main/stt\_llm\_tts-pipeline-example](https://github.com/RhinoDevel/mt_llm/tree/main/stt_llm_tts-pipeline-example)

STT –> LLM –> TTS pipeline in C

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 6 comments

[-]

rhinodevil@reddit (OP)

Really depends on multiple factors, but STT via Whisper.ccp, e.g. with large-v3-turbo-q5\_0, is pretty fast, even without a CUDA device, TTS via Piper is extremely fast (and I am fine with the output quality, even in non-english languages, although there are more modern, but also more hardware-hungry TTS modules out there) and LLM inference via Llama.cpp takes a lot more time than STT and TTS. But you can implement TTS-by-sentence to let the user already hear the LLM's answer while the LLM is still generating it.

STT –> LLM –> TTS pipeline in C

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 6 comments

[-]

rhinodevil@reddit (OP)

Thanks for hint, didn't know about [https://github.com/leejet/stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp)

STT –> LLM –> TTS pipeline in C

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 6 comments

[-]

rhinodevil@reddit (OP)

Maybe not so simple, because the libraries used (llama.cpp, whisper.cpp, Piper, etc.) must also be compiled to web assembly.

Running Gemma 3n on mobile locally

Posted by United_Dimension_46@reddit | LocalLLaMA | View on Reddit | 59 comments

[-]

rhinodevil@reddit

Just installed APK & model after downloading (see my other post). No licence agreements anywhere.

Running Gemma 3n on mobile locally

Posted by United_Dimension_46@reddit | LocalLLaMA | View on Reddit | 59 comments

[-]

rhinodevil@reddit

Just downloaded the APK & model file manually, installed on the phone, disabled internet access and it works. The APK file is downloadable from GitHub: [https://github.com/google-ai-edge/gallery/releases/tag/1.0.0](https://github.com/google-ai-edge/gallery/releases/tag/1.0.0) The models from Huggingface, e.g. E2B: [https://huggingface.co/google/gemma-3n-E2B-it-litert-preview/tree/main](https://huggingface.co/google/gemma-3n-E2B-it-litert-preview/tree/main)

L2E llama2.c on Commodore C-64

Posted by AMICABoard@reddit | LocalLLaMA | View on Reddit | 14 comments

[-]

rhinodevil@reddit

Awesome! So a 240kB, "Q8" model and a 2MB REU. Any idea yet how fast a native port of llama2.c would run with that on a real C64? :-)

L2E llama2.c on Commodore C-64

Posted by AMICABoard@reddit | LocalLLaMA | View on Reddit | 14 comments

[-]

rhinodevil@reddit

As this is not directly implemented in 6502 assembly as it seems, one could probably heavily improve performance by porting llama2.c directly to 6502. :-)

Mistral Small 3 24b is the first model under 70b I’ve seen pass the “apple” test (even using Q4).

Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 51 comments

[-]

rhinodevil@reddit

After playing around with it a bit more: There are the occasional missing syllables in (e.g.) German, but all in all very high quality.

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments

[-]

rhinodevil@reddit

Just checked out CohereForAI.aya-expanse-8b.Q5\_K\_M - pretty awesome German language support for 8b and in comparance to (e.g.) Qwen 14, too! Thanks for the hint.

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments

[-]

rhinodevil@reddit

I was hoping for Teuken 7b as a (relatively) small model that is good at German, but (at least the tokenizer) is not 100% supported by llama.cpp.

Mistral Small 3 24b is the first model under 70b I’ve seen pass the “apple” test (even using Q4).

Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 51 comments

[-]

rhinodevil@reddit

Are the 70B models in general able to write without errors in german? My experience with smaller models is that this does not work on a production-ready quality (e.g. Qwen 14B). But the new Mistral 24B model does a nice job writing in german, so far.

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments

[-]

rhinodevil@reddit

Maybe I just got lucky!

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments

[-]

rhinodevil@reddit

I agree, most "small" LLMs are not that good in speaking german (e.g. Qwen 14). But the answer is YES.

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments

[-]

rhinodevil@reddit

Hm, tried Q4\_K\_M with llama.cpp and the model got it right (unless I counted wrong, too..).