Those of you running minimax 2.7 locally, how are you feeling about it?
Posted by laterbreh@reddit | LocalLLaMA | View on Reddit | 120 comments
Im running the raw version straight from the minimax release on hugging face (https://huggingface.co/MiniMaxAI/MiniMax-M2.7) on 3 rtx pro 6000's on vllm. So no quantization. And i'm not going to lie something feels off about it.
Same workloads in our coding environment, including our re-usable evals on problem solving in our codebase and its very inconsistent. Our humans are scoring its output lower than 2.5 on some tasks.
Also it also not uncommon for it to make a spelling or miss putting a space between example const variable = something will instead constvariable =something then have to go back and fix it.
Anyone else experiencing any weirdness with the model? I've redownloaded straight from the HF repo twice and its the same results.
Sampling params:
--override-generation-config '{
"temperature": 1.0,
"top_p": 0.95,
"top_k": 40,
"repetition_penalty": 1.15,
"max_tokens": 16384
}'
funding__secured@reddit
I'm running it at full precision (FP8) on 4x RTX 6000 Pros on top of vllm. It's amazing. No complaints from my side.
somerussianbear@reddit
That’s 50 grand in hardware. Respect.
AfterShock@reddit
I want to believe him too
funding__secured@reddit
a_beautiful_rhind@reddit
They idle better than 3090s, even with a model loaded.
DeltaSqueezer@reddit
a_beautiful_rhind@reddit
Yea, I got 2 like that and 2 that don't go below 13w then jump to 20-30w as soon as I touch them. Add in p2p with rebar and it's over.
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 595.45.04 Driver Version: 595.45.04 CUDA Version: 13.2 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:1A:00.0 Off | N/A | | 0% 31C P8 15W / 275W | 1MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3090 On | 00000000:1B:00.0 Off | N/A | | 0% 30C P8 6W / 275W | 1MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA GeForce RTX 3090 On | 00000000:3D:00.0 Off | N/A | | 0% 25C P8 11W / 275W | 1MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA GeForce RTX 3090 On | 00000000:3E:00.0 Off | N/A | | 0% 27C P8 13W / 275W | 1MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA GeForce RTX 2080 Ti On | 00000000:B2:00.0 Off | N/A | | 22% 26C P8 2W / 250W | 1MiB / 22528MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
DeltaSqueezer@reddit
I found a trick. There's a specific point in time during vLLM start up which makes the power jump. I first load a small model with llama.cpp to take a low amount of VRAM and this sometimes stops this power jump, once vLLM crosses this boundary successfully, I terminate the small app so that vLLM can continue to load. It's a bit tricky, but once it starts up with low idle power it stays that way. I had it like this for months now. Depending on which version of vLLM you have the timing can be trickier or not 100% repeatable but a couple of attempts and it works and I leave it. For me, it is worth the slight hassle for lower power.
W2D2020@reddit
Oh we are allowed to post porn now?
No_Mango7658@reddit
That’s hot
funding__secured@reddit
It really warms up the office.
No_Mango7658@reddit
LatentSpacer@reddit
“LLM, generate the output of nvidia-smi on 4x RTX 6000 Pro”
Just kidding, lucky you! As a GPU lower middle class, I’m just envious of the GPU rich.
funding__secured@reddit
You mean financial irresponsible😄
LegacyRemaster@reddit
brutal
JoshuaLandy@reddit
r/usernamechecksout
ortegaalfredo@reddit
I want to live that dream too, Mr. Pool
GreenGreasyGreasels@reddit
Why won't you believe him ? He has funding secured for so many vram'ses.
funding__secured@reddit
I haven't even mentioned my GH200 or my 8 Sparks 😄
funding__secured@reddit
```$ nvidia-smi
Fri Apr 17 00:12:31 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 6000 Blac... On | 00000000:01:00.0 Off | Off |
| 31% 57C P8 18W / 250W | 90772MiB / 97887MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA RTX PRO 6000 Blac... On | 00000000:02:00.0 Off | Off |
| 30% 39C P8 4W / 250W | 90772MiB / 97887MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA RTX PRO 6000 Blac... On | 00000000:C1:00.0 Off | Off |
| 30% 54C P8 10W / 250W | 90772MiB / 97887MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA RTX PRO 6000 Blac... On | 00000000:E1:00.0 Off | Off |
| 30% 50C P8 14W / 250W | 90772MiB / 97887MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 715097 C VLLM::Worker_TP0_EP0 90762MiB |
| 1 N/A N/A 715098 C VLLM::Worker_TP1_EP1 90762MiB |
| 2 N/A N/A 715099 C VLLM::Worker_TP2_EP2 90762MiB |
| 3 N/A N/A 715100 C VLLM::Worker_TP3_EP3 90762MiB |
+-----------------------------------------------------------------------------------------+```
funding__secured@reddit
I could run it on the GH200, but it's running GLM-5.1 right now - IQ4_K for now.
Ok_Mammoth589@reddit
Maybe share some of your settings that are different from op's settings to help him out...
funding__secured@reddit
Fair enough - I use docker.
CKtalon@reddit
Are you using any harness like Codex, OpenCode, Claude Code? They work for me for a few requests before I get some connection error and I can’t figure out what’s wrong. I suspect it’s some context length issue even though I’m at around 30+k tokens and the GPUs can handle 190+k tokens.
funding__secured@reddit
I’m using Opencode. I never had an issue like that one. Are you checking the logs on vllm? It might have crashed for some reason. It’s really reliable for me and I REALLY push this thing to the max.
CKtalon@reddit
If I were to start a new session with zero context, the harnesses will continue and then again error out some time later, with vLLM/sglang (I’ve tried both) still running and capable of providing responses by curl, which is why I suspect some context window issue, but the logs show it’s around 30-40k tokens at the point the error happens
funding__secured@reddit
That’s very weird. Are you using kv cache quantization?
CKtalon@reddit
Bf16
funding__secured@reddit
Same here.
fanhed@reddit
I also ran the fp8 m2.7 with the same configuration, but in my local test cases, many strange problems occurred, such as infinite loops and sudden stops in output. And the test cases did not pass, while the m2.5 could pass smoothly. Interestingly, when I switched to the awq quantization version, it worked normally instead.
funding__secured@reddit
That’s very weird. Maybe something got corrupted during model download? Well, glad it’s working with awq for you at least.
JumpyAbies@reddit
How many tokens/s can you achieve in FP8?
funding__secured@reddit
Varies a bit:
Individual_Spread132@reddit
Tried Q3KM, IQ4XS, Q4KM - varied GGUFs from different uploaders on HF (unsloth, bartowski, aessedai). All recently updated. Fresh llamacpp build. And guess what? I'm getting random Chinese words/characters popping up sometimes, usually no more than once in one message. It doesn't happen too often, but it's hard to ignore and it's super annoying.
As for the missing spaces and spelling errors - yes, I've noticed it too, but it's not as jarring...
SnooPaintings8639@reddit
I have minimax 2.7 plugged into my OpenClaw... from their official API service, and also am getting random Chinese characters. I think this a quirk of this model.
Tim_tango_PT@reddit
I'm using minimax 2.7 hosted by minimax as my daily driver for Openclaw and I also get the random chinese characters sometimes.
LegacyRemaster@reddit
same. random Chinese characters
Individual_Spread132@reddit
Oh, well... That clears it up quite a bit. Thank you.
I hope they'll address it in the forthcoming versions of MiniMax.
yoracale@reddit
What CUDA version are you using? That might be the problem
Individual_Spread132@reddit
Honestly, it's a bit of a mess in that regard.
So, I don't have CUDA toolkit installed - it's just game-ready drivers and .cmd command returns Version 13.2.
However, I tend to use LMStudio or KoboldCPP (which, unless I'm deeply mistaken, come with all the necessary CUDA stuff? At least I see "CUDA 12 llamacpp" in LMStudio). I should mention that no other models have issues. Gemma 4 - absolutely fine. Older models - fine too.
Maybe you're right. What should I even do at this point?
Firestorm1820@reddit
13.2 has known issues (gibberish, Chinese characters etc). Rebuild with a 12.9.x CUDA to resolve.
tnhnyc@reddit
I've also gotten random Chinese popping up sometimes with some of those quants. Also another thing, with Aessedai IQ4_XS, the reasoning seemed strange when I compared the other quants with same prompts, it'd be a lot shorter than Unsloth's IQ3_XS or Nvidia NIM, I had GLM 5.1 look at the quant and this seemed to have "fixed" the reasoning.
Individual_Spread132@reddit
The real question is whether it's a quantization issue or the model itself... I mean, are there any quants that don't have that problem?
SeriousExplorer7479@reddit
What precision KV cache are you using?
laterbreh@reddit (OP)
fp8, but we have tried 16, similar results.
We are currently working on sampling changes and it seems to have made a bigger difference. looks like this model is particular about its sampling parameters for the tasks we are assigning it.
tidoo420@reddit
what kind of insane hardware is that, wtf do you guys do lmao, pablo expensivebar
1ncehost@reddit
31 tok/s on my Ryzen 395 with 128k context. Using it with OpenCode, it feels like the best models from last summer. Its my favorite local model rn.
valtor2@reddit
really? so like, opus 4-ish?
muyuu@reddit
I was considering a setup like this. How much RAM is it taking?
1ncehost@reddit
92ish GB
Anarchaotic@reddit
I didn't even consider this as a viable model on my strix halo. Is the quantization not that massive of a quality hit?
1ncehost@reddit
It works in opencode pretty well as far as I'm concerned.
ReactionaryPlatypus@reddit
I am running it on Strix Halo and am quite happy with it but I am also having those same issues (I thought it was caused by quantization).
Does anyone have a System prompt to reduce random Chinese characters appearing?
BacklashLaRue@reddit
Can I ask, what is your load configuration on the strix halo. I have the 128 GB version.
ReactionaryPlatypus@reddit
I'll post a mini guide this week.
Zyj@reddit
Same here at Q6. But also with Qwen. The Chinese characters aren‘t random, they are fitting words for the occasion
fanhed@reddit
My hardware is rtx pro 6000 x4, deploying m2.7 fp8 with vllm, used in opencode, encountering issues similar to you. Infinite loops and test cases fail, there are also very long thinking, and it will even stop by itself.
fanhed@reddit
There are even stranger things; after I switched to the awq quantified version, the situation actually improved.
laterbreh@reddit (OP)
Thats what I was considering trying was a quantized version... however moving to a quant instead of the full version to resolve the problem doesnt sit well with me.
fanhed@reddit
I suspect it's a vllm compatibility issue, you can try switching to sglang instead.
Remper1997@reddit
If you are using the official on on Mac now you can track you api usage with this simple app: https://github.com/Remper1997/MiniMaxUsage
Exciting-Engine882@reddit
I tried it with q8 xl quant with llama cpp and also saw random typos in the coding tests I have done. went back to qwen 3.5 the big one. too bad that it is having these issues. I will try a lower temperature, been using the default one which I think is 0.8
cmndr_spanky@reddit
temp at 1.0 ?!? lol
also max tokens at 16k is loco my first. You're not going to get any useful coding done with those settings.
What coding agent are you using ?
laterbreh@reddit (OP)
Opencode harnesses. Max tokens is a runaway safety, setting them lower doesnt impact code quality output for us.
Prior with m2.5 temp was 1.0 as well, this is whats recommended from Minimax so thats what we stuck with.
And to the contrary we get plenty of useful coding done with those exact settings on 2.5 using autonomous agents with our opencode harness.
Care to elaborate as to why/why not those settings?
jon23d@reddit
I use 1.0 too, though I need at least 30k tokens just to get started.
laterbreh@reddit (OP)
Our context window is 196,000 tokens.
cmndr_spanky@reddit
Are you running it from vLLM?
Just try temp of 0.6 repetition_penalty 1.0 max_tokens 32k and let me know, I'm very curious if it improves tool calling.
You're right max_tokens isn't as important as context_window size, however sometimes the LLM response (for efficiency reasons) may decide to trigger multiple stacked tool calls in a single response, tool schemas / calls can be token heavy so although not likely there's a chance it could rub up against a 16k single turn generation limit.
all that said.. I think repetition_penality and/or temperature could have the biggest impact.
I know not the same model, but I found repetition_penality 1.0 caused qwen 3.5 to fail at tool calling miserably meanwhile 1.1 was great. (My advice is start at 1.0 and then 1.1 next)
samthepotatoeman@reddit
I run it on 2 RTX 6000s and personally it's the first time a local model truly felt like it could let me break my over reliance on cloud models. It reminds me a lot of sonnet and I love it. Right now I use chatgpt as the planner and have it write detailed prompts for mini max 2.7 im still figuring out the best settings but I love the speed and abilities of this model.
Zyj@reddit
Did you use M2.5 previously?
samthepotatoeman@reddit
I apologize I am still new to this. I did mess around with 2.5. It was good, but I still just could tell I was using a local model that isn't as smart. It was close but just could tell it was going to hurt more things then help. 2.7 at the moment feels like a SOTA ish openweight model I can run well. Because I did like 2.5 I was on pins and needles for 2.7. As soon as it released I got it up and running and was very impressed. Still working on my first project with it, but very impressed so far.
using llama.cpp as the backend and Q4_K_M quant. Not sure if those are the best settings but it was the best I could cobble together when it released.
a_beautiful_rhind@reddit
Did you try different sampling? .8-.9 temp, no topk/p no rep penalty? Use top-n-sigma and min_P instead. Rep penalty is kinda destructive.
Run one repeatable test, maybe with same seed and see how the outputs change. Try to isolate what's causing your issue by removing variables.
laterbreh@reddit (OP)
Appreciate the tip-- never looked at top-n-sigma or min_p you mind sharing your settings? Thankfully we have repeatable test cases for evaluations. Will likely run a script to tweak and tune a step on each sample and let it run over night and then parse the data in the morning!
a_beautiful_rhind@reddit
I literally did. Top-n I do 1.0 for coherence. Min_P usually 0.01-0.03 to cut off the tail and junk tokens. Might have to patch VLLM for top-no sampling, I don't remember if it ever got merged but the authors posted code to do just that. I have to hybrid the model with only 96g of ram so I use IK_llama.
laterbreh@reddit (OP)
Yea sorry i menat about the top-n-sigma, couldnt find it in the recent docs of vllm so im currently playing with min_p and lower temperatures in my evals
a_beautiful_rhind@reddit
I think you can just edit site packages: https://github.com/Tomorrowdawn/top_nsigma
JayPSec@reddit
I'm running Luke Alonso's NVFP4 on two rtx 6000 max q. My main complaint with the model is the urge to go beyond what's asked of it. I find that a tight system prompt, I'm just running stock open code OpenAgents with some coding standards, works pretty well. But the model feels very vibe oriented, it wants to do everything and it better do it now. And it feels a bit confused with some non standard plugins like snip. I do think it's better for brainstorming than 2.5 but more unpredictable. As for the 'chinese' characters I've seen others pointing out, I've never seen them.
ambient_temp_xeno@reddit
Rep penalty should be off: 1.0
laterbreh@reddit (OP)
Ran an eval set with out repetition penalty, has made an improvement.
antunes145@reddit
I’m running it on a Mac M5 max 128 GB of course it’s the quantized version but it takes up a whopping 100 GB of RAM. I’m running it on VLMX. But out of all the local models I have ever tried the fit on my system. It is the only one that was able to get my coding requests spot on. It’s a very very capable model so far my favorite one for local coding. But I am now testing qwen 3.6 26b and honestly for its small size. It’s actually surprising me of the code quality. I might start using it for a smaller coding requests.
tomz17@reddit
Running in fp8 on a 9684x w 12-channel DDR5 + 2x3090. Seeing roughly 30t/s tg and 300 t/s prefill.
Very useable.
LegacyRemaster@reddit
For many tasks, I still use 2.1. For example, I'm generating a dataset now, and it's much faster and more consistent than the latest versions.
relmny@reddit
Although I don't do code, I'm using it more and more. It's actually becoming my main model.
I run the Q4_K_M quant and I'm really liking it. I used to run qwen3.5-27b (and 122b, etc) as I really like qwen, but this one feel really good.
No issues so far.
I use:
--temp 1.0 --top-k 40 --top-p 0.95 -c 24576
(can't fit more context, but I will try turboquant (TheTom), as a test to see if I can fit more)
someone383726@reddit
I’m running on 2 6000 pros and it’s been great for me. Generated lots of scripts without error. It feels like it is sonnet level
Zyj@reddit
Some Q4?
Cosmicdev_058@reddit
Definitely seeing some odd behavior with 2.7, especially around code formatting and consistency. It might be worth trying different inference engines like TGI or checking if a different quantization level helps.
For tracking these inconsistencies and comparing against 2.5, tools like LangSmith, Langfuse, or OrqAI with its eval and routing features could give you more insight or let you swap models easily.
fallingdowndizzyvr@reddit
I haven't seen a single Chinese character or spelling errors. I'm using Bartowski's Q4_K_S on llama.cpp B8816.
greentheonly@reddit
I am with you on this one. with claude code in particular it feels much worse than 2.5. I get random Chinese characters, but also some weird "why are you typing empty lines" and interpreting some tool call results are user input and being confused over that.
Only on 2 x pro6000 so have to use a quant (tried lukealonso's and then a couple of AWQs all with the same results). vllm 0.19.0 + the transformers update to get the model to actually load.
It really is visible when you run something exploratory in a largeish codebase for me, like "what is this code doing and how to use it?" -> "give me a comprehensive architectural review" -> "let's dive into sub-component X really deeply". 2.5 performs well, 2.7 breaks on step1 or two. (this is my standard test for new models before I consider switching to it)
...
jeffwadsworth@reddit
It is fine. Running the Q8 and coding is okay. My main model is GLM 5.1 Q4 though, so I am spoiled.
PraxisOG@reddit
I’ve had really good luck running it at iq3xxs on 96gb of vram across 3 32gb AMD V620s. I haven’t really put it through its paces yet, but I like how capable it is with knowing when to call tools and without thinking so much.
john0201@reddit
What kind of tps are you getting over pcie? I have 2x5090 and get crushed when I use both for the same model.
SillyLilBear@reddit
I get 130t/sec with two 6000 Pros on m2.7
No_Mango7658@reddit
He’s running fp8, you might be at a higher quantization
SillyLilBear@reddit
I am at fp4, but just saying pcie isn't a problem within inference.
laterbreh@reddit (OP)
60 tps
l9o-dot-dev@reddit
I'm running MLX 8bit on a Mac Studio. I see lots of those issues with missing spaces and occasional spelling errors too, but the worst has been tool calling. It often stops using the correct format (XML-based Minimax) and starts using Qwen braces, which creates problems with vllm-mlx. For now, I've patched vllm-mlx locally so it supports the Qwen format as a fallback and it's working ok. But I haven't quite seen that big of an improvement in any of my tasks, so I might rollback to M2.5.
ortegaalfredo@reddit
I'm using Int4 autoround with thousands of tool callings and I only saw it fail twice.
somerussianbear@reddit
Since you’re here, how many TPS are you getting?
laterbreh@reddit (OP)
Something in my gut says they may have impaired the weights... It genuinely feels sloppy compared to 2.5.
ortegaalfredo@reddit
I'm using it at Q4 and its silmilar to qwen3.5-397, slightly lacking in some jobs, but its so much faster, it feels like a small model.
rm-rf-rm@reddit
Please keep us posted! Im doing my customary 2 week wait approach before moving to a new model. But besides this post, every other post ive seen about 2.7 has been positive. So would be interesting to see if you are able to resolve the issues
llama-impersonator@reddit
it scores lower in world knowledge so it might have gotten enough of an RLobotomy to affect some tasks.
takoulseum@reddit
Too many issues at tool call, so I do not use it anymore (used with llama.cpp).
tiffanytrashcan@reddit
2.7?? You must have had a bad quant or some template majorly messed up, it's honestly one of the best for proper tool calls. (The syntax fails less often than GLM5 for me)
takoulseum@reddit
UD_Q4K_S with no specific flag for template. If anybody uses this quant and has no issue at all then I would be happy to know the server command used
tiffanytrashcan@reddit
Was this after they updated it? The original Unsloth release (and many others) had a few broken quants, I know a 4S of some type was definitely on that list.
takoulseum@reddit
I think it that was, but I will need to redownload to be sure
tiffanytrashcan@reddit
https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax_m27_gguf_investigation_fixes_benchmarks/
I just double checked, yes 4K_S was listed under "ours had NaNs". If you tried it before this update, it's worth another look.
Ok-Measurement-1575@reddit
Update vllm.
laterbreh@reddit (OP)
We run vllm nightly, we update once a week. Anything specific hit regarding minimax?
Qwen30bEnjoyer@reddit
Specs: 7700x, 96gb SODIMM DDR5 using an adapter, running at 4800 MT/s with a 16gb 6800xt.
150 tps PP, 7 tps TG on IQ3-XXS. Been busy with school, so I can't test it as much as I would like.
catplusplusok@reddit
1.0 is very high temperature for coding (you asking for and getting variety :-) ) and max tokens is very low, so you don't get much of context.
laterbreh@reddit (OP)
At 1.0 with m2.5 with our evaluation sets it could predictably reach the same conclusion even with that high of a temperature. We are using a maximum context window, i think you are mistaking max tokens with max context, the max tokens is a runaway protection in its output.
Lissanro@reddit
I tested minimax-m2.7-230b-a10b-ud_q5_k_xl (GGUF with llama.cpp) and it does not make obvious typos for me. In your settings, repetetion_penalty may be causing it - generally, it is bad, especially for programming, and even for creative writing may increase possibility of types too. That said, it is still not as smart as GLM 5.1 and not as coherent with long context as Kimi K2.5, which I still run the most on my rig, but Minimax M2.7 seems to be a good high speed model for simple to moderate complexity tasks.
takoulseum@reddit
How is it compared to Qwen3.5 397B?
Lissanro@reddit
Qwen3.5 397B is a bit more capable, so supports vision and fast for its size. But if you do not need vision, the difference is not that big. They also tend to pick different approaches, so if Qwen3.5 397B or Minimax M2.7 got stuck on something, it may help to pick the other one.
Specific-Rub-7250@reddit
I am using Q8_0 (temp=1.0, top_p=0.95, min_p=0.01, top_k=40) and it performs absolutely amazing work. It’s really SOTA level.
val_in_tech@reddit
Feels like an update. A bit less schizophrenic than 2.5. I tried even smaller quants like q2 q3 (ik ones) and they looked fine to me. You don't need full precision for minimax IMO.
val_in_tech@reddit
Feels like an update. A bit less schizophrenic than 2.5. I tried even smaller quants like q2 q3 (ik ones) and they looked fine to me. You don't need full precision for minimax IMO.
Crafty-Celery-2466@reddit
Ive been using it for some synthetic data gen and it’s very nice compared to others. My task is not v complex but very nuanced instruction following is needed.
tiffanytrashcan@reddit
I've noticed on the OpenCode API (which they usually pass through directly from the provider) that agentic use is better, but the thought output is garbage vs 2.5... It can do amazing things by default, but seems more likely to do the wrong thing.