LeMochileiro

Man trains local model to detect and kill mosquitos with a laser

Posted by No_Information9314@reddit | LocalLLaMA | View on Reddit | 48 comments

Linux ROCm now supports WSL2 sanely (but isn't bug free yet), build instructions included

Posted by Diablo-D3@reddit | LocalLLaMA | View on Reddit | 8 comments

LeMochileiro@reddit

Recently, I ran Qwen3.6-35B-A3B on 2x RX9060 16GB vram each + 32GBs RAM with rocm. I felt that ROCM had an "I bought the whole piano I'm gonna use the whole piano" vibe; it pushes the GPUs to its limit as if there were no tomorrow. The average power consumption is 140W in both GPUs, with temperatures rising to +20°C during extended runs. Ultimately, I didn't notice a significant performance difference compared to Vulkan, which averages 90W and doesn't put much strain on the GPUs. Meanwhile, I'll be using Vulkan, but checking different configurations and parameters weekly with ROCM. Vulkan is working perfectly with this model. Using harness engineering, I can implement complex features in real-world projects.

Poor performance on RX 9070 XT

Posted by WhatererBlah555@reddit | LocalLLaMA | View on Reddit | 25 comments

LeMochileiro@reddit

> Is your ttft better with lemonade then > but what kills me is the huge reprocessing time with larger contexts on OpenCode Those are exactly the problems I had before. And somehow Lemonade solved them. I'll even research further to see how they do it behind the scenes. One of the negative issues I have with Lemonade is its poor documentation. But even so, I recommend trying Lemonade. I also use it with Opencode and it works very well. Try it and then comment here about your experience.

Poor performance on RX 9070 XT

Posted by WhatererBlah555@reddit | LocalLLaMA | View on Reddit | 25 comments

LeMochileiro@reddit

I'm running an RX9060 with 16GB of VRAM and 32GB of RAM. Things I've learned: * It has to run on Linux. GPU drivers for AMD work much better and are more stable on Linux. * It won't be as fast as Nvidia GPUs, so don't compare results too much of other users. Start by doing your own real-world tests to achieve an acceptable performance range for your LLM tasks. * Personally, MoE gave me better results than MTP. * Try using Lemonade, it's easy to set up. Last week I migrated from LocalAI and started using Lemonade. The spec settings are much more difficult in Lemonade, but its auto-configurations are so good that I didn't need to change the specs. The inferences from LLMs are so smooth now. I'm going to buy another RX9060 16Gb to try and get with higher models and be ready for next-generation models. I'm running Qwen3.6-31-A3B, and I'm getting around 25-30 tokens per second with Lemonade. Using LocalAI I was achieving 23/tks, but my biggest problem was TTFT, which made the experience horrible.

Are local LLM users testing prompt injection before connecting models to tools?

Posted by sunychoudhary@reddit | LocalLLaMA | View on Reddit | 32 comments

LeMochileiro@reddit

Since running a tool agent almost wrecked my local system just to run some tests, I only run them inside containers. It doesn't solve all the problems, but if he messes up, it will be within a very limited scope.

Is Qwen3.6 current king for local agentic use?

Posted by HornyGooner4402@reddit | LocalLLaMA | View on Reddit | 150 comments

Could Open Models be trained to secretly go rogue?

Posted by nunodonato@reddit | LocalLLaMA | View on Reddit | 54 comments

Qwen 3.6 27B MTP speed on 3080ti (getting 4.5 t/s)

Posted by yehiaserag@reddit | LocalLLaMA | View on Reddit | 31 comments

What frontend do you guys use?

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 92 comments

LeMochileiro@reddit

Since the models run on a different machine with a dedicated GPU (a node in a K8s/K3s cluster), I need more flexibility to run different models with different configurations and parameters, without having to access the machine via SSH or create containers/pods all the time. LocalAI is what has been serving me quite well lately. It deploys different backends (like llama.cpp or Vulkan), download different Hugging Face LLMs, change parameters, check usage, and easily integrate with tools like OpenCode. Everything is done directly through the frontend. I'm open to exploring alternatives, but it has to be in Docker/a container for me to be able to run it on Kubernetes.

Qwen 3.7 Max

Posted by Sicarius_The_First@reddit | LocalLLaMA | View on Reddit | 77 comments