LegacyRemaster

nex-agi/Nex-N2-mini • Huggingface

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 16 comments

Does anyone have news about the next GLM or Kimi model?

Posted by ihatebeinganonymous@reddit | LocalLLaMA | View on Reddit | 12 comments

Skip Nvidia New Spark Laptops?

Posted by Hannibalj2ca@reddit | LocalLLaMA | View on Reddit | 45 comments

LegacyRemaster@reddit

If I could connect RTX 6000 and 5000 graphics cards to this new platform, it would be a game-changer in local area network interference: fast RAM and very fast VRAM. But as is the case with Strix Halo, I think it will remain a dream.

Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3

Posted by valtor2@reddit | LocalLLaMA | View on Reddit | 4 comments

LegacyRemaster@reddit

I currently love Mimo 2.5 Pro for the few hallucinations it has. I run 2.5 locally. On Opencode 2.5 it's free, as is Deepseek 4 Flash. The problem is the use case: for coding, DS4 is fine. But if you need to ask for anything humanistic (texts, explanations, news, etc.) it's at the top of the hallucination rate.

Qwen 3.7 Plus just briefly appeared and then disappeared on OpenRouter.

Posted by ihatebeinganonymous@reddit | LocalLLaMA | View on Reddit | 28 comments

Qwen 3.7 Plus just briefly appeared and then disappeared on OpenRouter.

Posted by ihatebeinganonymous@reddit | LocalLLaMA | View on Reddit | 28 comments

Calling it now Microsoft is buying Unsloth.

Posted by Wrong_Mushroom_7350@reddit | LocalLLaMA | View on Reddit | 336 comments

LegacyRemaster@reddit

There's nothing wrong with getting paid for your hard work. You can be open and receive money, and that's natural. Never justify yourself for it: we understand.

I burned a weekend making the models "remember" me. The fix had nothing to do with trying to run bigger models locally

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 20 comments

LegacyRemaster@reddit

Install Opencode, its Deepseek 4 Flash, which is free, and ask it to create a web app with LMstudio-style chat that interfaces with LlamaServer and persistent storage. Then have fun adding features. It's free.

ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI by allozaur · Pull Request #23434 · ggml-org/llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 9 comments

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

LegacyRemaster@reddit

These days I'm using claude superskill with vscode+claudecode. Qwen 3.6 27b daily driver. Then I go down to 35b moe for fast things and go up to Mimo 2.5 for complex things.

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

Stepfun 3.7 Flash is very good

Posted by -dysangel-@reddit | LocalLLaMA | View on Reddit | 89 comments

LegacyRemaster@reddit

https://preview.redd.it/74moo3qnlh4h1.png?width=3825&format=png&auto=webp&s=49184368c51dd1f42fa59f30aa77f622a9a489bd Qwen 3.7 max.... I hope 3.7 27b will be out sooooon

Stepfun 3.7 Flash is very good

Posted by -dysangel-@reddit | LocalLLaMA | View on Reddit | 89 comments

LegacyRemaster@reddit

It depends on what you need to do. I use qwen 3.6 27b with vscode+claude code, and with stepfun 3.7, unfortunately, my thought cycles are too long. So, it's true that I don't pay for tokens in terms of money, but I do pay for them in terms of time and waste waiting for responses.

Stepfun 3.7 Flash is very good

Posted by -dysangel-@reddit | LocalLLaMA | View on Reddit | 89 comments

My home data center

Posted by alecKarfonta@reddit | LocalLLaMA | View on Reddit | 86 comments

Cost Analysis of my $6.4k Local LLM Server

Posted by 1ncehost@reddit | LocalLLaMA | View on Reddit | 73 comments

LegacyRemaster@reddit

I keep reiterating that API costs must be carefully calculated: 1) RTX 6000 + w7800 48GB x 2. 300W + 200W + 200W (I lowered the voltage on all of them). The system consumes about 900W at full load (which rarely happens). 2) I use the local system for coding (vscode + claude or kilo or opencode or cline), video creation, image creation, music creation, and meshes. 3) How many APIs do I have to buy and how many tokens do I have to pay to do what I do locally with only the cost of electricity? In the winter, I also save on heating. 4) My workstations have already paid for themselves with the products I sell. 100% privacy, I can use "heretical" films to generate content I can't generate online (try writing technical reports on military systems, for example), I'm not tracked... I think it's a great investment to create your own infrastructure. Furthermore, if I sold the entire setup at today's price, I'd earn at least $4,000 more than when I bought it.

Is he crazy to say that?

Posted by pmv143@reddit | LocalLLaMA | View on Reddit | 203 comments

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 135 comments

Under 3 second time to first token, I literally don’t know what to add or do next for my local LLM. Can I get some input on ways to improve it?

Posted by Fear_ltself@reddit | LocalLLaMA | View on Reddit | 15 comments

vLLM PR adding native HIP W4A16 kernel was merged

Posted by StupidityCanFly@reddit | LocalLLaMA | View on Reddit | 11 comments

llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 78 comments

StepFun 3.7 Flash - Speed Benchmark in M5 Max

Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 13 comments

LegacyRemaster@reddit

https://preview.redd.it/ibpmzhdxr14h1.png?width=2072&format=png&auto=webp&s=1e9c5c7f4ad385d91e9fa5f1fbe30aa88ea3c32e ok fast it's fast. We will see long context

StepFun 3.7 Flash - Speed Benchmark in M5 Max

Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 13 comments

StepFun 3.7 Flash - Speed Benchmark in M5 Max

Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 13 comments

StepFun 3.7 Flash

Posted by Everlier@reddit | LocalLLaMA | View on Reddit | 151 comments

I've just benchmarked myself:

Posted by JLeonsarmiento@reddit | LocalLLaMA | View on Reddit | 171 comments

The frontier reasoning race is starting to look like a crowded subway station

Posted by ExoticYesterday8282@reddit | LocalLLaMA | View on Reddit | 63 comments

MiniMax M3 Is Coming Up

Posted by Few_Painter_5588@reddit | LocalLLaMA | View on Reddit | 7 comments

LegacyRemaster@reddit

The reason I upgraded to mimo 2.5 was that prefilling above 100,000 tokens was too slow. If they fixed this issue with the new attention, it would be great news.

SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More

Posted by CuriousPlatypus1881@reddit | LocalLLaMA | View on Reddit | 41 comments

Looks like Miminax-M3 is just around the corner

Posted by OnkelBB@reddit | LocalLLaMA | View on Reddit | 40 comments

LegacyRemaster@reddit

We'll see what license it will be distributed under, what size, and what interference/prefill speed. I've currently replaced it with Mimo 2.5, with no regrets.

Stop pretending self-hosting is cheaper. It's not. We do it for different reasons and we should say so.

Posted by Napster3301@reddit | LocalLLaMA | View on Reddit | 88 comments

LegacyRemaster@reddit

The reasoning behind the specific use case is fun. If I consider that I can use image generators, audio generators, video generators, and when I code for 2-3 million tokens on a project that is then discarded, I don't have to worry about saying "I risk or not risk the development/modification," I'd say I paid off my video cards in less than a year.

server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 40 comments

Coding Agent Tier List After Using These Across Real Production Codebases

Posted by Cute_Dragonfruit4738@reddit | LocalLLaMA | View on Reddit | 7 comments

Gemma is so much better than Qwen, prove me wrong

Posted by Mountain_Patience231@reddit | LocalLLaMA | View on Reddit | 62 comments

397B competitor that fits in 256 RAM?

Posted by quietsubstrate@reddit | LocalLLaMA | View on Reddit | 53 comments

397B competitor that fits in 256 RAM?

Posted by quietsubstrate@reddit | LocalLLaMA | View on Reddit | 53 comments

LegacyRemaster@reddit

start C:\\llm\\llamamimo\\build\\bin\\Release\\llama-server.exe --model "H:\\gptmodel\\AesSedai\\MiMo-V2.5-GGUF\\MiMo-V2.5-IQ3\_S-00001-of-00004.gguf" --ctx-size 1048576 --threads 16 --host [127.0.0.1](http://127.0.0.1) \--no-mmap --jinja --fit on --flash-attn on -sm layer --n-cpu-moe 0 --threads 16 --parallel 1 --temp 1 --repeat\_penalty 1.0 --min\_p 0.02 --presence\_penalty 0.0 --mmproj H:\\gptmodel\\AesSedai\\MiMo-V2.5-GGUF\\mmproj-MiMo-V2.5-F32.gguf - Vulkan RTX 6000 96gb + W7800 48gb

397B competitor that fits in 256 RAM?

Posted by quietsubstrate@reddit | LocalLLaMA | View on Reddit | 53 comments

DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 119 comments

AMD Powers Next-Generation Agent Computers with New Ryzen AI Halo Developer Platform and Ryzen AI Max PRO 400 Series Processors

Posted by Baumpaladin@reddit | LocalLLaMA | View on Reddit | 66 comments

Waiting for Qwen 3.7 open weight... The new King has arrived...

Posted by LegacyRemaster@reddit | LocalLLaMA | View on Reddit | 284 comments

Waiting for Qwen 3.7 open weight... The new King has arrived...

Posted by LegacyRemaster@reddit | LocalLLaMA | View on Reddit | 284 comments

Waiting for Qwen 3.7 open weight... The new King has arrived...

Posted by LegacyRemaster@reddit | LocalLLaMA | View on Reddit | 284 comments

Waiting for Qwen 3.7 open weight... The new King has arrived...

Posted by LegacyRemaster@reddit | LocalLLaMA | View on Reddit | 284 comments

We're Thursday and no one claimed AGI yet this week!

Posted by oodelay@reddit | LocalLLaMA | View on Reddit | 70 comments

Re. what ever happened to Cohere’s Command-A series of models?

Posted by nick_frosst@reddit | LocalLLaMA | View on Reddit | 102 comments

Guardrails take an 8B model from 53% to 99% on agentic tasks [ACM CAIS '26 preprint]

Posted by billy_booboo@reddit | LocalLLaMA | View on Reddit | 14 comments

Guardrails take an 8B model from 53% to 99% on agentic tasks [ACM CAIS '26 preprint]

Posted by billy_booboo@reddit | LocalLLaMA | View on Reddit | 14 comments