Anyone else experiencing heavy hallucinations with MiMo-V2.5 (310B) quantized version?
Posted by Shoddy_Bed3240@reddit | LocalLLaMA | View on Reddit | 8 comments
Has anyone else run into major issues with MiMo-V2.5 (the 310B total / 15B active MoE model from Xiaomi)?
I tried the UD-Q4_K_XL quant from Unsloth. Use llama.cpp.
It hallucinates really badly, especially on practical tasks. I gave it a list of files to analyze via OpenCode, and it kept messing up filenames and file paths — inventing ones that don't exist, mixing them up, or just confidently wrong about the directory structure.
Has anyone had better luck with other quants (e.g., higher bits like Q5/Q6)?
seamonn@reddit
Does it drive the car to the carwash?
Shoddy_Bed3240@reddit (OP)
Response: You should drive the car to the carwash. The whole point is to wash the car, so you need to bring it there! 🚗🧼
2Norn@reddit
it doesnt lol
Practical-Collar3063@reddit
the only question that matters
czktcx@reddit
Hope you didn't enable dry sampler, it's causing endless wrong file path and missing piece in long code on my previous test.
Goldandsilverape99@reddit
Try this (or a variation that fits your set up), path\llama-server.exe -m path/MiMo-V2.5-UD-Q4_K_XL-00001-of-00005.gguf --mmproj path/mmproj-MiMo-V2.5-BF16.gguf --flash-attn on --ctx-size 32768 --threads 12 --temp 0.6 --top-p 0.95 --jinja --no-mmap -np 1 -ctk q8_0 -ctv q8_0 --repeat_penalty 1.0 --min_p 0.02 --presence_penalty 0.0
this was better for me.....
Ok_Technology_5962@reddit
Im using q8 unsloth. I find that i need to put min p to 0.1 top p 0.95 temp 0.6 top k 20 repeate pen 1.05 . Seems okay, might still be some issues in the way llama. Cpp is using it or maybe wuant errors but its working for me up to 170k tokens havent gone further yet
FoxiPanda@reddit
I've been able to get it into reasoning loops, but I've had pretty decent luck with tool use and directories and such. I've mostly been using Q5 and Q8 though, so definitely not apples to apples.