rhinodevil

Man trains local model to detect and kill mosquitos with a laser

Posted by No_Information9314@reddit | LocalLLaMA | View on Reddit | 48 comments

rhinodevil@reddit

Thanks. "Mild laser enjoyer" 😄 I was wondering what kind of laser you'd need to burn the mosqitoes with such a visible flame as in the video.

Man trains local model to detect and kill mosquitos with a laser

Posted by No_Information9314@reddit | LocalLLaMA | View on Reddit | 48 comments

How to build a shitty robot

Posted by badlogicgames@reddit | LocalLLaMA | View on Reddit | 7 comments

rhinodevil@reddit

Seems to me like no RAG, if I understand correctly. I was just asking, because I have a project where I do memory extraction from recent chats and later memory injection via RAG (cosine similarity score & reranking) and I experienced the local LLMs to be limited in understanding that and very sensitive about the prompt structure (smaller LLMs get the injected memories and the actual user prompt confused more often, than larger LLMs, naturally).

How to build a shitty robot

Posted by badlogicgames@reddit | LocalLLaMA | View on Reddit | 7 comments

rhinodevil@reddit

I was always wondering (no joke) why there is no electronic pet from e.g. Apple, like this [https://www.pleoworld.com/pleo\_rb/eng/products.php](https://www.pleoworld.com/pleo_rb/eng/products.php) , but in good.

How to build a shitty robot

Posted by badlogicgames@reddit | LocalLLaMA | View on Reddit | 7 comments

rhinodevil@reddit

Very nice! Back in the days (even pre-LLM) I tried to build a robot with a smartphone as its brain with [https://www.kickstarter.com/projects/peterseid/romo-the-smartphone-robot-for-everyone/](https://www.kickstarter.com/projects/peterseid/romo-the-smartphone-robot-for-everyone/) and [https://www.robot-advance.com/EN/art-rovio-1178.htm](https://www.robot-advance.com/EN/art-rovio-1178.htm) . I did not get that far.. About the "boring" software parts: Are you happy with the memory system (if it is already implemented)? What did you use / how does it work?

STT -> LLM -> TTS pipeline

Posted by UniqueIdentifier00@reddit | LocalLLaMA | View on Reddit | 30 comments

rhinodevil@reddit

Posted an example using Whisper.cpp, llama.cpp and Piper for exactly this here a while ago: [https://www.reddit.com/r/LocalLLaMA/comments/1nj673e/stt\_llm\_tts\_pipeline\_in\_c/](https://www.reddit.com/r/LocalLLaMA/comments/1nj673e/stt_llm_tts_pipeline_in_c/)

Why bother with local LLMs?

Posted by West-Currency-4423@reddit | LocalLLaMA | View on Reddit | 39 comments

Audio processing landed in llama-server with Gemma-4

Posted by srigi@reddit | LocalLLaMA | View on Reddit | 72 comments

rhinodevil@reddit

Problems with the silence did not happen to me.. Maybe a configuration issue? Or using one of the smaller models? Is Parakeet also better than Whisper for other languages than English?

I put a transformer model on a stock Commodore 64

Posted by gizmo64k@reddit | LocalLLaMA | View on Reddit | 31 comments

In terms of Quality, how good is Bonsai 8B?

Posted by AsrielPlay52@reddit | LocalLLaMA | View on Reddit | 3 comments

Gemma 4 26b A3B is mindblowingly good , if configured right

Posted by cviperr33@reddit | LocalLLaMA | View on Reddit | 371 comments

rhinodevil@reddit

That's what llama-bench is for: [https://github.com/ggml-org/llama.cpp/tree/master/tools/llama-bench](https://github.com/ggml-org/llama.cpp/tree/master/tools/llama-bench)

Local AI Agent Wake words

Posted by betanu701@reddit | LocalLLaMA | View on Reddit | 9 comments

rhinodevil@reddit

Maybe this works better? [https://github.com/TaterTotterson/microWakeWord-Trainer-Nvidia-Docker](https://github.com/TaterTotterson/microWakeWord-Trainer-Nvidia-Docker)

Local AI Agent Wake words

Posted by betanu701@reddit | LocalLLaMA | View on Reddit | 9 comments

SOTA Language Models Under 14B?

Posted by No-Mud-1902@reddit | LocalLLaMA | View on Reddit | 25 comments

rhinodevil@reddit

From my experience, being relatively GPU-poor: With a GeForce Mobile 4060 8 GB, llama.cpp and Windows I am getting about 6 tokens per second with 27B-UD-IQ3\_XXS (from Unsloth). Although 27B does NOT run entirely on that GPU, so RAM and CPU are also playing a part, here! I am expecting it to be a bit faster with Linux. The 9B Qwen 3.5 runs in Linux with 40 to 50 tokens per second, GeForce 3060 6GB, entirely on the GPU (sorry, did not test every combination, have two different GeForce laptops, here).

I just want to catch up on local LLM's after work..

Posted by ForsookComparison@reddit | LocalLLaMA | View on Reddit | 47 comments

Question: Prompt format for memory injection (local offline AI assistant, 6GB VRAM)?

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 5 comments

rhinodevil@reddit (OP)

It may be two things here: 1) A smaller model having trouble differentiating between injected data and user input. 2) The smaller model does differentiate between injected memories and current chat context, but it is too difficult for the smaller model to always make the right choice about when to use what injected information in its answer. Qwen 3.5 9B may be smart enough for (1), but might always have trouble with (2). Maybe enabling reasoning/thinking could help there, but then it would take too long to get an answer..

Question: Prompt format for memory injection (local offline AI assistant, 6GB VRAM)?

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 5 comments

rhinodevil@reddit (OP)

Thanks for you answer. Could you please explain a bit how model2vec would help to get answers in chat that do not interpret the injected memories as part of the chat context?

How do you think a Qwen 72B dense would perform?

Posted by OmarBessa@reddit | LocalLLaMA | View on Reddit | 32 comments

rhinodevil@reddit

On the low end, at least for my use cases, the dense 9B model performes way better than the 30B and 35B(-A3B) MoE models. The dense 27B model is unfortunately too slow on my consumer-grade hardware. So I guess a 72B dense model would perform much better than 9B and 27B.

What is the best open-source options to create a pipeline like ElevenLab (Speech-to-text, brain LLM and text-to-speech)

Posted by frequiem11@reddit | LocalLLaMA | View on Reddit | 4 comments

rhinodevil@reddit

I built a little sample pipeline for this with my "wrapper" libraries based on Whisper.cpp, llama.cpp and Piper, in C/C++, here: [https://github.com/RhinoDevel/mt\_llm/tree/main/stt\_llm\_tts-pipeline-example](https://github.com/RhinoDevel/mt_llm/tree/main/stt_llm_tts-pipeline-example)

STT –> LLM –> TTS pipeline in C

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 6 comments

rhinodevil@reddit (OP)

Really depends on multiple factors, but STT via Whisper.ccp, e.g. with large-v3-turbo-q5\_0, is pretty fast, even without a CUDA device, TTS via Piper is extremely fast (and I am fine with the output quality, even in non-english languages, although there are more modern, but also more hardware-hungry TTS modules out there) and LLM inference via Llama.cpp takes a lot more time than STT and TTS. But you can implement TTS-by-sentence to let the user already hear the LLM's answer while the LLM is still generating it.

STT –> LLM –> TTS pipeline in C

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 6 comments

STT –> LLM –> TTS pipeline in C

Posted by rhinodevil@reddit | LocalLLaMA | View on Reddit | 6 comments

Running Gemma 3n on mobile locally

Posted by United_Dimension_46@reddit | LocalLLaMA | View on Reddit | 59 comments

Running Gemma 3n on mobile locally

Posted by United_Dimension_46@reddit | LocalLLaMA | View on Reddit | 59 comments

rhinodevil@reddit

Just downloaded the APK & model file manually, installed on the phone, disabled internet access and it works. The APK file is downloadable from GitHub: [https://github.com/google-ai-edge/gallery/releases/tag/1.0.0](https://github.com/google-ai-edge/gallery/releases/tag/1.0.0) The models from Huggingface, e.g. E2B: [https://huggingface.co/google/gemma-3n-E2B-it-litert-preview/tree/main](https://huggingface.co/google/gemma-3n-E2B-it-litert-preview/tree/main)

L2E llama2.c on Commodore C-64

Posted by AMICABoard@reddit | LocalLLaMA | View on Reddit | 14 comments

L2E llama2.c on Commodore C-64

Posted by AMICABoard@reddit | LocalLLaMA | View on Reddit | 14 comments

rhinodevil@reddit

As this is not directly implemented in 6502 assembly as it seems, one could probably heavily improve performance by porting llama2.c directly to 6502. :-)

Mistral Small 3 24b is the first model under 70b I’ve seen pass the “apple” test (even using Q4).

Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 51 comments

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments

rhinodevil@reddit

Just checked out CohereForAI.aya-expanse-8b.Q5\_K\_M - pretty awesome German language support for 8b and in comparance to (e.g.) Qwen 14, too! Thanks for the hint.

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments

rhinodevil@reddit

I was hoping for Teuken 7b as a (relatively) small model that is good at German, but (at least the tokenizer) is not 100% supported by llama.cpp.

Mistral Small 3 24b is the first model under 70b I’ve seen pass the “apple” test (even using Q4).

Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 51 comments

rhinodevil@reddit

Are the 70B models in general able to write without errors in german? My experience with smaller models is that this does not work on a production-ready quality (e.g. Qwen 14B). But the new Mistral 24B model does a nice job writing in german, so far.

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments

mistral-small-24b-instruct-2501 is simply the best model ever made.

Posted by hannibal27@reddit | LocalLLaMA | View on Reddit | 352 comments