FluoroquinolonesKill

google/gemma-4-12B · Hugging Face

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 285 comments

Gemma 4 MTP released

Posted by rerri@reddit | LocalLLaMA | View on Reddit | 301 comments

Local AI is the best

Posted by fake_agent_smith@reddit | LocalLLaMA | View on Reddit | 60 comments

FluoroquinolonesKill@reddit

This. That is probably what is motivating bro to work like that to begin with. Bro needs to do serious self examination. Source: me, an intellectual, judging people for making mistakes I recently learned to stop making.

If it works - don’t touch it: COMPETITION

Posted by awfulalexey@reddit | LocalLLaMA | View on Reddit | 112 comments

Gemma 4 - lazy model or am I crazy? (bit of a rant)

Posted by Pyrenaeda@reddit | LocalLLaMA | View on Reddit | 151 comments

Gemma 4 31B vs Qwen 3.5 27B: Which is best for long context worklows? My THOUGHTS...

Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 174 comments

More Gemma4 fixes in the past 24 hours

Posted by andy2na@reddit | LocalLLaMA | View on Reddit | 120 comments

Gemma 4 on Llama.cpp should be stable now

Posted by ilintar@reddit | LocalLLaMA | View on Reddit | 167 comments

I think my Gemma4 is having a breakdown

Posted by MrSilencerbob@reddit | LocalLLaMA | View on Reddit | 20 comments

FluoroquinolonesKill@reddit

Yeah Gemma was not having it when I tried to tell it what today’s date is. That seems like it should be something that any model should be able to accept. Hopefully it gets ironed out.

It looks like we’ll need to download the new Gemma 4 GGUFs

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 147 comments

so…. Qwen3.5 or Gemma 4?

Posted by MLExpert000@reddit | LocalLLaMA | View on Reddit | 121 comments

Gemma 4 has been abliterated

Posted by coder3101@reddit | LocalLLaMA | View on Reddit | 26 comments

Gemma 4 has been released

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 702 comments

Gemma 4 has been released

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 702 comments

#OpenSource4o Movement Trending on Twitter/X - Release Opensource of GPT-4o

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 183 comments

I fine-tuned Qwen3.5-27B with 35k examples into an AI companion - after 2,000 conversations here’s what actually matters for personality

Posted by Crypto_Stoozy@reddit | LocalLLaMA | View on Reddit | 59 comments

Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA

Posted by pigeon57434@reddit | LocalLLaMA | View on Reddit | 152 comments

Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA

Posted by pigeon57434@reddit | LocalLLaMA | View on Reddit | 152 comments

webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 59 comments

FluoroquinolonesKill@reddit

No. In the chat turn, there’s the error message, and there’s a little arrow to expand it and then an option to enable the proxy. It took me 15 minutes this morning to find it, because I was not expecting to have to enable the option there. And, that was even after I passed the flag to enable the proxy.

webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 59 comments

webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 59 comments

FluoroquinolonesKill@reddit

Thanks! Possible bug: When I enable a MCP server in the global settings, it does not remember. So, when I start a new chat, I have to re-enable the MCP server either in the chat or the global settings. I.e., starting a new chat and then inspecting the global settings shows the MCP server disabled, despite the fact that it was previously enabled.

Qwen3.5 35b UD Q4 K XL Prior to 3/5 worked great, now not so much...

Posted by thejacer@reddit | LocalLLaMA | View on Reddit | 19 comments

webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 59 comments

Qwen3.5 "Low Reasoning Effort" trick in llama-server

Posted by coder543@reddit | LocalLLaMA | View on Reddit | 22 comments

FluoroquinolonesKill@reddit

This works when I set it in the WebUI, but it does not work when I try to pass the parameters in the .ini file like this: `logit-bias = 248069+13.3` `grammar = "root ::= pre <[248069]> post\npre ::= !<[248069]>*\npost ::= !<[248069]>*"` Any ideas?

Qwen 3.5 27-35-122B - Jinja Template Modification (Based on Bartowski's Jinja) - No thinking by default - straight quick answers, need thinking? simple activation with "/think" command anywhere in the system prompt.

Posted by -Ellary-@reddit | LocalLLaMA | View on Reddit | 26 comments

Qwen 3.5 Jinja Template – Restores Qwen /no_thinking behavior!

Posted by Substantial_Swan_144@reddit | LocalLLaMA | View on Reddit | 14 comments

You can use Qwen3.5 without thinking

Posted by guiopen@reddit | LocalLLaMA | View on Reddit | 86 comments

Nemo 30B is insane. 1M+ token CTX on one 3090

Posted by Dismal-Effect-1914@reddit | LocalLLaMA | View on Reddit | 112 comments

ACE-Step-1.5 has just been released. It’s an MIT-licensed open source audio generative model with performance close to commercial platforms like Suno

Posted by iGermanProd@reddit | LocalLLaMA | View on Reddit | 138 comments

GLM 4.7 Flash: Huge performance improvement with -kvu

Posted by TokenRingAI@reddit | LocalLLaMA | View on Reddit | 72 comments

KV cache fix for GLM 4.7 Flash

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 73 comments

Quiet Threadripper AI Workstation - 768GB DDR5 and 160GB VRAM (RTX 5090 + 4x R9700)

Posted by sloptimizer@reddit | LocalLLaMA | View on Reddit | 99 comments

GLM 4.7 Flash official support merged in llama.cpp

Posted by ayylmaonade@reddit | LocalLLaMA | View on Reddit | 64 comments

FluoroquinolonesKill@reddit

First impression: Running with the llama.cpp WebUI. reasoning-budget = 0 disables the reasoning. I am using temp = 1.0, top-k = 64, min-p = 0.00, top-p = 0.95, and dry-multiplier = 1.1. I am impressed with its ability to do role play and therapy. I have not seen any GPT slop, e.g. "it's not x, but y." I am getting about 8 t/s with flash attention off. Hopefully the speed improves. This might be a great candidate for fine tuning for role play.

GLM 4.7 Flash official support merged in llama.cpp

Posted by ayylmaonade@reddit | LocalLLaMA | View on Reddit | 64 comments

If you dont think Ai is an emergency you are about to have issues...

Posted by CannyGardener@reddit | preppers | View on Reddit | 813 comments

FluoroquinolonesKill@reddit

> Thing is, AI can't remain this inexpensive forever. Yes it can. Local AI is free aside from fixed setup costs. Small local models are extremely useful for the types of tasks OP said he automated. People don’t realize how powerful local AI can be, even on consumer hardware. > IMO the AI bubble collapsing is a far greater concern than AI taking all our jobs. That bubble is far larger than the housing bubble was, and is the only part of the US's GDP that has seen growth in the last 6 months. Most of the money caught up in this is investors hoping to get a return, and it's increasingly clear that they won't be able to get that return. There are significant differences between this and the housing bubble. A lot of the investment in AI is coming from actual cash that large companies have on hand, unlike the sub-prime derivatives market that popped the housing bubble.

Mistral Small Creative!?

Posted by LoveMind_AI@reddit | LocalLLaMA | View on Reddit | 22 comments

Mistral Small Creative!?

Posted by LoveMind_AI@reddit | LocalLLaMA | View on Reddit | 22 comments

Mistral Small Creative!?

Posted by LoveMind_AI@reddit | LocalLLaMA | View on Reddit | 22 comments

My little decentralized Locallama setup, 216gb VRAM

Posted by Goldkoron@reddit | LocalLLaMA | View on Reddit | 154 comments

FluoroquinolonesKill@reddit

Macs are boutique products aimed at receptions of hair salons. They are used by bouffanted ponce gaylords who cannot handle anything more complex than one mouse button. They are hideously expensive and utterly restrictive. Real men use windows and get the fucking job done with raw power and unlimited options.

Mistral 3 14b against the competition ?

Posted by EffectiveGlove1651@reddit | LocalLLaMA | View on Reddit | 25 comments

FluoroquinolonesKill@reddit

I think it is just naturally wild. Perhaps the recommended sampling values are not tight enough. Try playing with the sampling parameters. I tried cranking Top-K down to 5, and the results are much more controlled and coherent. Here is a helpful link that lets you play with sampling parameters in isolation to see what they do: [https://artefact2.github.io/llm-sampling/index.xhtml](https://artefact2.github.io/llm-sampling/index.xhtml)

My experiences with the new Ministral 3 14B Reasoning 2512 Q8

Posted by egomarker@reddit | LocalLLaMA | View on Reddit | 106 comments

FluoroquinolonesKill@reddit

I added it as a system prompt. I placed it right before my main system prompt. Unsloth has one with a little different formatting. Here is Unsloth's: `<s>[SYSTEM_PROMPT]# HOW YOU SHOULD THINK AND ANSWER` `First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.` `Your thinking process must follow the template below:[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response to the user.[/THINK]Here, provide a self-contained response.[/SYSTEM_PROMPT][INST]What is 1+1?[/INST]2</s>[INST]What is 2+2?[/INST]` I was able to get it working - for now - using this portion of Unsloth's, which again, is placed right before my main system prompt. `[SYSTEM_PROMPT]# HOW YOU SHOULD THINK AND ANSWER` `First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.` `Your thinking process must follow the template below:[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response to the user.[/THINK]Here, provide a self-contained response.[/SYSTEM_PROMPT]`

My experiences with the new Ministral 3 14B Reasoning 2512 Q8

Posted by egomarker@reddit | LocalLLaMA | View on Reddit | 106 comments

My experiences with the new Ministral 3 14B Reasoning 2512 Q8

Posted by egomarker@reddit | LocalLLaMA | View on Reddit | 106 comments

FluoroquinolonesKill@reddit

Thank you. That seems to work. It's kind of a workflow faf, having to add that for only this model. Hopefully future iterations will make it unnecessary. Perhaps other front ends can make this easier, but I am using llama.cpp's new Web UI, which is pretty basic.

My experiences with the new Ministral 3 14B Reasoning 2512 Q8

Posted by egomarker@reddit | LocalLLaMA | View on Reddit | 106 comments

FluoroquinolonesKill@reddit

I have been testing out Ministral-3 8b and 14b reasoning. I am really liking 14b a lot. I compared it to Gemma-3 12b and 27b, and I am liking Ministral-3 14b more. Ministral-3 might actually replace Gemma-3 for my RP/creative writing daily driver. I tried the reasoning model, but the reasoning seems broken. Something might be wrong with the chat template. Idk. All my tests have been with the reasoning model where it is just working without reasoning. I am about to try the instruct model to compare.

Mistral 3 Blog post

Posted by rerri@reddit | LocalLLaMA | View on Reddit | 173 comments

FluoroquinolonesKill@reddit

The reasoning models (8b and 14b) are not reasoning. Is there something wrong with the embedded chat template? I tried the Unsloth and MistralAI GGUFs from a few hours ago. I am using the latest llama.cpp.

Ministral-3 has been released

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 61 comments

FluoroquinolonesKill@reddit

The reasoning models (8b and 14b) are not reasoning. Is there something wrong with the embedded chat template? I tried the Unsloth and MistralAI GGUFs from a few hours ago. I am using the latest llama.cpp. It looks like Unsloth has updated the GGUFs as of 20 minutes ago. I am pulling them now and will report back to this comment.

I have a RTX5090 and an AMD AI MAX+ 95 128GB. Which benchmark do you want me to run?

Posted by foogitiff@reddit | LocalLLaMA | View on Reddit | 36 comments

FluoroquinolonesKill@reddit

Yeah, after some research, I came to the same conclusion. It is frustrating that I had to spend so much time researching to determine that this processor - which is marketed as an "AI processor" with tons of memory - is actually not great for dense models. I think I saw one person say they were getting 3 t/s with Gemma-3-27b on Strix Halo. I can get that on my 8GB Nvidia. I can also run Qwen3-30b-3Ab just fine. The only thing Strix Halo offers me is the ability to run larger MoE models. Until and if those models become the norm, then Strix Halo is not a good buy for me. All that said, I am grateful for the people that have put Strix Halo through its paces and published their results. They have helped a lot of people make informed decisoins.

I have a RTX5090 and an AMD AI MAX+ 95 128GB. Which benchmark do you want me to run?

Posted by foogitiff@reddit | LocalLLaMA | View on Reddit | 36 comments

Budget Hardware Recommendations (1.3k)

Posted by xxxmralbinoxxx@reddit | LocalLLaMA | View on Reddit | 5 comments

FluoroquinolonesKill@reddit

Thanks for your comment. I'm in a similar boat. I want to run Gemma3-27b and Mistral-Small-24b with full context. Would the 64GB (48GB VRAM) AI Max+395 handle that just fine? Would the 128GB (96GB VRAM) be overkill?

if open-webui is trash, whats the next best thing available to use?

Posted by Tricky_Reflection_75@reddit | LocalLLaMA | View on Reddit | 173 comments