LegacyRemaster

nex-agi/Nex-N2-mini • Huggingface

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 16 comments

[-]

Does anyone have news about the next GLM or Kimi model?

Posted by ihatebeinganonymous@reddit | LocalLLaMA | View on Reddit | 12 comments

[-]

LegacyRemaster@reddit

we are waiting for qwen and glm but yeah... no news right now

Skip Nvidia New Spark Laptops?

Posted by Hannibalj2ca@reddit | LocalLLaMA | View on Reddit | 45 comments

[-]

If I could connect RTX 6000 and 5000 graphics cards to this new platform, it would be a game-changer in local area network interference: fast RAM and very fast VRAM. But as is the case with Strix Halo, I think it will remain a dream.

Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3

Posted by valtor2@reddit | LocalLLaMA | View on Reddit | 4 comments

[-]

LegacyRemaster@reddit

I currently love Mimo 2.5 Pro for the few hallucinations it has. I run 2.5 locally. On Opencode 2.5 it's free, as is Deepseek 4 Flash. The problem is the use case: for coding, DS4 is fine. But if you need to ask for anything humanistic (texts, explanations, news, etc.) it's at the top of the hallucination rate.

Qwen 3.7 Plus just briefly appeared and then disappeared on OpenRouter.

Posted by ihatebeinganonymous@reddit | LocalLLaMA | View on Reddit | 28 comments

[-]

LegacyRemaster@reddit

classic

Qwen 3.7 Plus just briefly appeared and then disappeared on OpenRouter.

Posted by ihatebeinganonymous@reddit | LocalLLaMA | View on Reddit | 28 comments

[-]

LegacyRemaster@reddit

[https://artificialanalysis.ai/models/qwen3-7-plus](https://artificialanalysis.ai/models/qwen3-7-plus) It appeared here too but unfortunately not on HF

Calling it now Microsoft is buying Unsloth.

Posted by Wrong_Mushroom_7350@reddit | LocalLLaMA | View on Reddit | 336 comments

[-]

LegacyRemaster@reddit

There's nothing wrong with getting paid for your hard work. You can be open and receive money, and that's natural. Never justify yourself for it: we understand.

I burned a weekend making the models "remember" me. The fix had nothing to do with trying to run bigger models locally

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 20 comments

[-]

LegacyRemaster@reddit

Install Opencode, its Deepseek 4 Flash, which is free, and ask it to create a web app with LMstudio-style chat that interfaces with LlamaServer and persistent storage. Then have fun adding features. It's free.

ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI by allozaur · Pull Request #23434 · ggml-org/llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 9 comments

[-]

LegacyRemaster@reddit

amazing!

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

[-]

LegacyRemaster@reddit

4M tokens ---> wrong

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

[-]

LegacyRemaster@reddit

These days I'm using claude superskill with vscode+claudecode. Qwen 3.6 27b daily driver. Then I go down to 35b moe for fast things and go up to Mimo 2.5 for complex things.

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

[-]

LegacyRemaster@reddit

The main difference is the vision and context finally from 1M as mimo.

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

[-]

LegacyRemaster@reddit

I think this version will require a lot of GPUs. I don't think it's 200b.

next MiniMax will be released in ~10 Days

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 52 comments

[-]

LegacyRemaster@reddit

We don't know the size yet, but the training data is larger. 27 vs. 200/300/600b makes a big difference.

Stepfun 3.7 Flash is very good

Posted by -dysangel-@reddit | LocalLLaMA | View on Reddit | 89 comments

[-]

LegacyRemaster@reddit

https://preview.redd.it/74moo3qnlh4h1.png?width=3825&format=png&auto=webp&s=49184368c51dd1f42fa59f30aa77f622a9a489bd Qwen 3.7 max.... I hope 3.7 27b will be out sooooon

Stepfun 3.7 Flash is very good

Posted by -dysangel-@reddit | LocalLLaMA | View on Reddit | 89 comments

[-]

LegacyRemaster@reddit

It depends on what you need to do. I use qwen 3.6 27b with vscode+claude code, and with stepfun 3.7, unfortunately, my thought cycles are too long. So, it's true that I don't pay for tokens in terms of money, but I do pay for them in terms of time and waste waiting for responses.

Stepfun 3.7 Flash is very good

Posted by -dysangel-@reddit | LocalLLaMA | View on Reddit | 89 comments

[-]

LegacyRemaster@reddit

me too

My home data center

Posted by alecKarfonta@reddit | LocalLLaMA | View on Reddit | 86 comments

[-]

LegacyRemaster@reddit

sounds hot 😃

Cost Analysis of my $6.4k Local LLM Server

Posted by 1ncehost@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

LegacyRemaster@reddit

I keep reiterating that API costs must be carefully calculated: 1) RTX 6000 + w7800 48GB x 2. 300W + 200W + 200W (I lowered the voltage on all of them). The system consumes about 900W at full load (which rarely happens). 2) I use the local system for coding (vscode + claude or kilo or opencode or cline), video creation, image creation, music creation, and meshes. 3) How many APIs do I have to buy and how many tokens do I have to pay to do what I do locally with only the cost of electricity? In the winter, I also save on heating. 4) My workstations have already paid for themselves with the products I sell. 100% privacy, I can use "heretical" films to generate content I can't generate online (try writing technical reports on military systems, for example), I'm not tracked... I think it's a great investment to create your own infrastructure. Furthermore, if I sold the entire setup at today's price, I'd earn at least $4,000 more than when I bought it.

Is he crazy to say that?

Posted by pmv143@reddit | LocalLLaMA | View on Reddit | 203 comments

[-]

LegacyRemaster@reddit

me too

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 135 comments

[-]

LegacyRemaster@reddit

I think the guardrails rule applies to both vibecoders and traditional Devs.

Under 3 second time to first token, I literally don’t know what to add or do next for my local LLM. Can I get some input on ways to improve it?

Posted by Fear_ltself@reddit | LocalLLaMA | View on Reddit | 15 comments

[-]

LegacyRemaster@reddit

good... Another star from me

vLLM PR adding native HIP W4A16 kernel was merged

Posted by StupidityCanFly@reddit | LocalLLaMA | View on Reddit | 11 comments

[-]

LegacyRemaster@reddit

wait.... What?? 2x W7800 48gb ready to test

llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 78 comments

[-]

LegacyRemaster@reddit

sounds awesome

StepFun 3.7 Flash - Speed Benchmark in M5 Max

Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

LegacyRemaster@reddit

https://preview.redd.it/ibpmzhdxr14h1.png?width=2072&format=png&auto=webp&s=1e9c5c7f4ad385d91e9fa5f1fbe30aa88ea3c32e ok fast it's fast. We will see long context

StepFun 3.7 Flash - Speed Benchmark in M5 Max

Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

LegacyRemaster@reddit

https://preview.redd.it/xq6o6pduq14h1.png?width=1396&format=png&auto=webp&s=f0054eca910a96285836ba0248c75bee146fd915 i'm lazy now

StepFun 3.7 Flash - Speed Benchmark in M5 Max

Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

LegacyRemaster@reddit

Dowloading. Will test on rtx 6000 96gb + w7800 48gb q\_4\_ks

StepFun 3.7 Flash

Posted by Everlier@reddit | LocalLLaMA | View on Reddit | 151 comments

[-]

LegacyRemaster@reddit

196b ... heroes

I've just benchmarked myself:

Posted by JLeonsarmiento@reddit | LocalLLaMA | View on Reddit | 171 comments

[-]

LegacyRemaster@reddit

so.... you about 15b. Impressive.

The frontier reasoning race is starting to look like a crowded subway station

Posted by ExoticYesterday8282@reddit | LocalLLaMA | View on Reddit | 63 comments

[-]

LegacyRemaster@reddit

ahahah gpt4o ahahahahah

MiniMax M3 Is Coming Up

Posted by Few_Painter_5588@reddit | LocalLLaMA | View on Reddit | 7 comments

[-]

LegacyRemaster@reddit

The reason I upgraded to mimo 2.5 was that prefilling above 100,000 tokens was too slow. If they fixed this issue with the new attention, it would be great news.

SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More

Posted by CuriousPlatypus1881@reddit | LocalLLaMA | View on Reddit | 41 comments

[-]

LegacyRemaster@reddit

finally thx

Looks like Miminax-M3 is just around the corner

Posted by OnkelBB@reddit | LocalLLaMA | View on Reddit | 40 comments

[-]

LegacyRemaster@reddit

We'll see what license it will be distributed under, what size, and what interference/prefill speed. I've currently replaced it with Mimo 2.5, with no regrets.

Stop pretending self-hosting is cheaper. It's not. We do it for different reasons and we should say so.

Posted by Napster3301@reddit | LocalLLaMA | View on Reddit | 88 comments

[-]

LegacyRemaster@reddit

The reasoning behind the specific use case is fun. If I consider that I can use image generators, audio generators, video generators, and when I code for 2-3 million tokens on a project that is then discarded, I don't have to worry about saying "I risk or not risk the development/modification," I'd say I paid off my video cards in less than a year.

server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 40 comments

[-]

LegacyRemaster@reddit

classic 😃

Coding Agent Tier List After Using These Across Real Production Codebases

Posted by Cute_Dragonfruit4738@reddit | LocalLLaMA | View on Reddit | 7 comments

[-]

LegacyRemaster@reddit

fake news. Mimo 2.5 is top tier.

Gemma is so much better than Qwen, prove me wrong

Posted by Mountain_Patience231@reddit | LocalLLaMA | View on Reddit | 62 comments

[-]

LegacyRemaster@reddit

Prove? Connect vscode+cline and test 😃

397B competitor that fits in 256 RAM?

Posted by quietsubstrate@reddit | LocalLLaMA | View on Reddit | 53 comments

[-]

LegacyRemaster@reddit

60 tokens/sec on my HW until 100k tokens

397B competitor that fits in 256 RAM?

Posted by quietsubstrate@reddit | LocalLLaMA | View on Reddit | 53 comments

[-]

LegacyRemaster@reddit

start C:\\llm\\llamamimo\\build\\bin\\Release\\llama-server.exe --model "H:\\gptmodel\\AesSedai\\MiMo-V2.5-GGUF\\MiMo-V2.5-IQ3\_S-00001-of-00004.gguf" --ctx-size 1048576 --threads 16 --host [127.0.0.1](http://127.0.0.1) \--no-mmap --jinja --fit on --flash-attn on -sm layer --n-cpu-moe 0 --threads 16 --parallel 1 --temp 1 --repeat\_penalty 1.0 --min\_p 0.02 --presence\_penalty 0.0 --mmproj H:\\gptmodel\\AesSedai\\MiMo-V2.5-GGUF\\mmproj-MiMo-V2.5-F32.gguf - Vulkan RTX 6000 96gb + W7800 48gb