Pristine_Income9554

Why don't we still have any games with AI agents used as NPC characters?

Posted by Another__one@reddit | LocalLLaMA | View on Reddit | 110 comments

Pristine_Income9554@reddit

https://preview.redd.it/w96nwkn4rx4h1.png?width=2560&format=png&auto=webp&s=530204052fddffee4c2e6b6d082fe1c8eeff93e2 4 days ago I started to making one. And obvious problems- it's too expensive even if you make model decide 1/3 of things, or If you local it's too slow. But I have my ideas how to get around it.

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Posted by Alternative-Cat-1347@reddit | LocalLLaMA | View on Reddit | 40 comments

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Posted by Alternative-Cat-1347@reddit | LocalLLaMA | View on Reddit | 40 comments

Pristine_Income9554@reddit

bigger then default -b -ub will speed up prompt processing for a price of cache size. If you not load full model in to vram you need set one of this 3 flags (-cmoe or -ncmoe or -ot), I don't saw anything new news ab llama.cpp auto setting weights for MoE models

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Posted by Alternative-Cat-1347@reddit | LocalLLaMA | View on Reddit | 40 comments

Pristine_Income9554@reddit

\-cmoe or -ncmoe or -ot all 3 do same thing in different way. I don't see any mtp flags. For my setup mtp slower so I cant recommend anything on this topic My: ./llama-server -m ..../Qwen3.6-35B-A3B-UD-Q4_K_M.gguf --alias "Qwen3.6-35B-A3B" --host 0.0.0.0 -t 8 -tb 12 -cmoe -b 2048 -ub 2048 --ctx-size 65536 --jinja -fa on -ctk q8_0 -ctv q8_0 --fit on --fit-target 248 --no-mmap --no-context-shift --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.00 --chat-template-kwargs '{"preserve_thinking": true}'

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Posted by Alternative-Cat-1347@reddit | LocalLLaMA | View on Reddit | 40 comments

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Posted by Alternative-Cat-1347@reddit | LocalLLaMA | View on Reddit | 40 comments

Qwen3.6-35B - Terrible instruction following when using context files (with vanilla pi-agent). Model issue or am I doing something wrong?

Posted by FusionX@reddit | LocalLLaMA | View on Reddit | 9 comments

Qwen 3.6 35 UD 2 K_XL is pulling beyond its weight and quantization (No one is GPU Poor now)

Posted by dreamai87@reddit | LocalLLaMA | View on Reddit | 68 comments

Qwen 3.6 35 UD 2 K_XL is pulling beyond its weight and quantization (No one is GPU Poor now)

Posted by dreamai87@reddit | LocalLLaMA | View on Reddit | 68 comments

Qwen 3.6 35 UD 2 K_XL is pulling beyond its weight and quantization (No one is GPU Poor now)

Posted by dreamai87@reddit | LocalLLaMA | View on Reddit | 68 comments

Qwen 3.6 35 UD 2 K_XL is pulling beyond its weight and quantization (No one is GPU Poor now)

Posted by dreamai87@reddit | LocalLLaMA | View on Reddit | 68 comments

Qwen 3.6 35 UD 2 K_XL is pulling beyond its weight and quantization (No one is GPU Poor now)

Posted by dreamai87@reddit | LocalLLaMA | View on Reddit | 68 comments

Qwen 3.6 35 UD 2 K_XL is pulling beyond its weight and quantization (No one is GPU Poor now)

Posted by dreamai87@reddit | LocalLLaMA | View on Reddit | 68 comments

Pristine_Income9554@reddit

you 2 messed up somewhere https://preview.redd.it/8km75phykrvg1.png?width=503&format=png&auto=webp&s=ccfe714215e0cf0e52109ecb84ba4b525233f074 Generation 21-17 t/s prompt processing \~578.82 tokens/s on RTX2060 6gb \`\`\` \-m Qwen3.6-35B-A3B-UD-Q4\_K\_XL.gguf --alias "Qwen3.6-35B-A3B" --host [0.0.0.0](http://0.0.0.0) \-cmoe -b 2048 -ub 2048 --ctx-size 65536 --jinja -fa on -ctk q8\_0 -ctv q8\_0 --fit on --fit-target 128 --no-mmap --no-context-shift --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.00 -np 1 --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 \`\`\`

Qwen3.5-35B-A3B-Heretic running surprisingly fast on RTX 3060 Ti 8GB - is Heretic castrated compared to original?

Posted by Temporary-Lack-1408@reddit | LocalLLaMA | View on Reddit | 47 comments

Pristine_Income9554@reddit

You should have way better numbers. try \`-t 6 -tb 12 -ngl 999 -cmoe -b 2048 -ub 2048 --ctx-size 65536 --jinja -fa on -ctk q8\_0 -ctv q8\_0 --fit on --fit-target 128 --no-mmap\`

Imrpove Qwen3.5 Performance on Weak GPU

Posted by MarketingGui@reddit | LocalLLaMA | View on Reddit | 22 comments

Pristine_Income9554@reddit

llama-server.exe -m D:\\ggufModels\\Qwen3.5-35B-A3B-UD-Q4\_K\_XL.gguf --alias "Qwen3.5-35B-A3B" -t 6 -tb 12 -cmoe -b 2048 -ub 2048 --ctx-size 65536 --jinja -fa on -ctk q4\_0 -ctv q4\_0 --fit on --fit-target 64 -np 1 --no-mmap --no-context-shift 12 t/s with rtx 2060 6gb vram; 40gb ram 2936 MHz; Ryzen 7 2700x

System prompt for Qwen3.5 (27B/35BA3B) to reduce overthinking?

Posted by thigger@reddit | LocalLLaMA | View on Reddit | 27 comments

GLM4.7-Flash REAP @ 25% live on HF + agentic coding evals

Posted by ilzrvch@reddit | LocalLLaMA | View on Reddit | 20 comments

Fix for GLM 4.7 Flash has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 91 comments

Pristine_Income9554@reddit

Who cares ab fix that can be fixed with flag --override-kv deepseek2.expert\_gating\_func=int:2 . OP title is deceptive as main problem with GLM 4.7 Flash is broken flash attention

Fix for GLM 4.7 Flash has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 91 comments

glm-4.7-flash has the best thinking process with clear steps, I love it

Posted by uptonking@reddit | LocalLLaMA | View on Reddit | 38 comments

My gpu poor comrades, GLM 4.7 Flash is your local agent

Posted by __Maximum__@reddit | LocalLLaMA | View on Reddit | 169 comments

My gpu poor comrades, GLM 4.7 Flash is your local agent

Posted by __Maximum__@reddit | LocalLLaMA | View on Reddit | 169 comments

I fine-tuned a 7B model for reasoning on free Colab with GRPO + TRL

Posted by External-Rub5414@reddit | LocalLLaMA | View on Reddit | 2 comments

AI has replaced programmers… totally.

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 297 comments

AI has replaced programmers… totally.

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 297 comments

ChatGPT stopped lying to me when I started treating it like a scared kid

Posted by Nan0pixel@reddit | LocalLLaMA | View on Reddit | 13 comments

Pristine_Income9554@reddit

It's same thing as a tip meta. Models trained on data from humans, and we are lazy as fuck. Thx to laziness we have our civilization advance(to make hard work easier). And we now expect a thing trained on our data don't have our flaws?

Wan 2.1 1.3B fighting video is not as good as the Qwen 2.5 fighting videos I previously posted. I used the Wan 2.1 1.3B from Huge.com. Qwen 2.5 must be using some other type of super model for videos. Because this Wan has lost its' way.

Posted by Extension-Fee-8480@reddit | LocalLLaMA | View on Reddit | 10 comments

Wan 2.1 1.3B fighting video is not as good as the Qwen 2.5 fighting videos I previously posted. I used the Wan 2.1 1.3B from Huge.com. Qwen 2.5 must be using some other type of super model for videos. Because this Wan has lost its' way.

Posted by Extension-Fee-8480@reddit | LocalLLaMA | View on Reddit | 10 comments

Exceeding VRAM limit with QWQ IQ3XXS i1 quant, no OOM? (LM studio)

Posted by No_Expert1801@reddit | LocalLLaMA | View on Reddit | 7 comments

Think Tool Boosts Accuracy by 54%! (+ Ollama integration)

Posted by Straight-Worker-4327@reddit | LocalLLaMA | View on Reddit | 21 comments

Pristine_Income9554@reddit

You missing things that this Tool works with any good model with ollama without training. If model trained how to work with Function Calling, it will work well not only with this *“think” tool*, but with search or RAG as well.

Qwen LIED TO US

Posted by random-tomato@reddit | LocalLLaMA | View on Reddit | 7 comments

Think Tool Boosts Accuracy by 54%! (+ Ollama integration)

Posted by Straight-Worker-4327@reddit | LocalLLaMA | View on Reddit | 21 comments

Think Tool Boosts Accuracy by 54%! (+ Ollama integration)

Posted by Straight-Worker-4327@reddit | LocalLLaMA | View on Reddit | 21 comments

Pristine_Income9554@reddit

Even if we assume full chat context + reasoning Function Call in the same call gives better result, it's still just Function Call like RAG or internet search, or img gen, that trying to cheaply have similar result as reasoning models, it's nothing new, just stripped down Function Call that only ask model a question with custom prompt

Think Tool Boosts Accuracy by 54%! (+ Ollama integration)

Posted by Straight-Worker-4327@reddit | LocalLLaMA | View on Reddit | 21 comments

Pristine_Income9554@reddit

It's just the same reasoning thing wrapped inside Function Calling so you don't need train model to output thinking and answer in 1 reply, but instead you have 2 with similar result

Is the DeepSeek model poisoned at the data level?

Posted by aospan@reddit | LocalLLaMA | View on Reddit | 10 comments

Pristine_Income9554@reddit

Would be strange if they had datasets not aligned with CCP policies. Model is not created for personal use of western people. You could run it, but don't expect it to have same world view as west.

1 Million Token Context Length 🔥

Posted by CelebrationClean7309@reddit | LocalLLaMA | View on Reddit | 39 comments

Opensource 8B parameter test time compute scaling(reasoning) model

Posted by TheLogiqueViper@reddit | LocalLLaMA | View on Reddit | 36 comments

It's getting difficult to evaluate models.

Posted by baehyunsol@reddit | LocalLLaMA | View on Reddit | 52 comments

KoboldcPP is such a gigantic leap in QoL coming from Oobabooga is just ridiculous.

Posted by pumukidelfuturo@reddit | LocalLLaMA | View on Reddit | 58 comments

6 bit quantization

Posted by Ok-Cicada-5207@reddit | LocalLLaMA | View on Reddit | 9 comments

Is LLM Studio good?

Posted by Top_Sonic@reddit | LocalLLaMA | View on Reddit | 91 comments

Is LLM Studio good?

Posted by Top_Sonic@reddit | LocalLLaMA | View on Reddit | 91 comments

Tumera 0.1.0a2 is here!

Posted by Sad-Fix-7915@reddit | LocalLLaMA | View on Reddit | 9 comments

Pristine_Income9554@reddit

The lifecycle of software and code is so much shorter compared to the time a lot of those patterns were invented, the next update or the next technology or the next best pattern could come out tomorrow, making your effort be in vain. You will not see new like mvvm architecture in next 2-3 years 100% b it works not only on pc but on phones using maui(new Xamarin) and you don't need invent bicycle, like mvc for asp.net

Tumera 0.1.0a2 is here!

Posted by Sad-Fix-7915@reddit | LocalLLaMA | View on Reddit | 9 comments

Tumera 0.1.0a2 is here!

Posted by Sad-Fix-7915@reddit | LocalLLaMA | View on Reddit | 9 comments

Pristine_Income9554@reddit

John Gossman, a Microsoft WPF and Silverlight architect, announced MVVM on his blog in **2005**. Model–view–viewmodel is also referred to as model–view–binder, especially in implementations not involving the . NET platform.

Tumera 0.1.0a2 is here!

Posted by Sad-Fix-7915@reddit | LocalLLaMA | View on Reddit | 9 comments

Pristine_Income9554@reddit

You will not learn how to drive a car by driving bicycle. Don't waist your time. If you want to finish app then you don't really need mvvm, but if you want to learn pls start straight with mvvm, because without it you will get bad habits. For example have bunch of app beckend code in frontend ChatPage.xaml.cs

Tumera 0.1.0a2 is here!

Posted by Sad-Fix-7915@reddit | LocalLLaMA | View on Reddit | 9 comments

Pristine_Income9554@reddit

1. You need to learn MVVM 2. This will help [https://marketplace.visualstudio.com/items?itemName=TemplateStudio.TemplateStudioForWinUICs](https://marketplace.visualstudio.com/items?itemName=TemplateStudio.TemplateStudioForWinUICs)

Handy calculator for figuring out how much VRAM you need for a specific model + context window

Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 7 comments

Run Qwen 2.5, Qwen 2.5-Coder, Qwen 2.5-Math, and Other LMs in GGUF Format from HF 🤗 Locally

Posted by unseenmarscai@reddit | LocalLLaMA | View on Reddit | 18 comments

Run Qwen 2.5, Qwen 2.5-Coder, Qwen 2.5-Math, and Other LMs in GGUF Format from HF 🤗 Locally

Posted by unseenmarscai@reddit | LocalLLaMA | View on Reddit | 18 comments