into_devoid

PSA

Posted by Signal_Ad657@reddit | LocalLLaMA | View on Reddit | 523 comments

ReAligned-Qwen3.5 Release

Posted by faldore@reddit | LocalLLaMA | View on Reddit | 23 comments

ReAligned-Qwen3.5 Release

Posted by faldore@reddit | LocalLLaMA | View on Reddit | 23 comments

Could Open Models be trained to secretly go rogue?

Posted by nunodonato@reddit | LocalLLaMA | View on Reddit | 54 comments

into_devoid@reddit

Yes, easily.  One long string that is highly unlikely to be accidentally triggered can lead to trained behavior. Non-nefarious example: https://www.toxsec.com/p/the-magic-string-that-bricks-claude

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Posted by ai-infos@reddit | LocalLLaMA | View on Reddit | 80 comments

into_devoid@reddit

It holds stead at 10k+ drops to 38+ around 20k, then starts to increase again to 42k as context grows.  Odd behavior, but consistent. As a commenter mentioned, the latest pr fixes slowed it down on mi50 hardware. Pp a little more than half on this early pr, but had it running an agentic workflow and it was quick enough.  8 cards will of course increase pp, but it’s plenty useable even with the early pr mtp penalty.

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Posted by ai-infos@reddit | LocalLLaMA | View on Reddit | 80 comments

Model stuck in some thinking zone where it keeps saying a similar thing again and again

Posted by BitGreen1270@reddit | LocalLLaMA | View on Reddit | 17 comments

Are Qwen 3.6 27B and 35B making other ~30B models obsolete?

Posted by nikhilprasanth@reddit | LocalLLaMA | View on Reddit | 145 comments

Are Qwen 3.6 27B and 35B making other ~30B models obsolete?

Posted by nikhilprasanth@reddit | LocalLLaMA | View on Reddit | 145 comments

What speed is everyone getting on Qwen3.6 27b?

Posted by Ambitious_Fold_2874@reddit | LocalLLaMA | View on Reddit | 255 comments

American closed models vs Chinese open models is becoming a problem.

Posted by __JockY__@reddit | LocalLLaMA | View on Reddit | 622 comments

into_devoid@reddit

People are worried about "magic strings" that can lead to target behavior when prompt injected. It's a noted behavior on models already. I guess they would prefer to be hacked by America instead of China?

Nvidia's GeForce Now Gets a Native Linux Desktop Client

Posted by Putrid_Draft378@reddit | linux | View on Reddit | 69 comments

into_devoid@reddit

They’re not assuming, they’re projecting. I would argue they’re helping you to buy one in the future in doing so.  Endorsing cloud only computing ignores the real problem, big tech doesn’t want you to own your own hardware. Every promotion of this as anything more than a temporary bandaid harms long-term gaming. Don’t make them comfortable thinking they can keep suffocating the home user market.

Kimi K2.5 is the best open model for coding

Posted by npc_gooner@reddit | LocalLLaMA | View on Reddit | 265 comments

XiaomiMiMo/MiMo-V2-Flash · Hugging Face

Posted by Dark_Fire_12@reddit | LocalLLaMA | View on Reddit | 61 comments

into_devoid@reddit

Don’t take initial openrouter providers as 100% reference.  When models first get released, many providers rush and misconfigure.  Wait a week or two, then re-evaluate when things are more mature.

The Absurdity of the prices of consumer RAM versus ECC RAM

Posted by Substantial_Cut_9418@reddit | LocalLLaMA | View on Reddit | 57 comments

into_devoid@reddit

It only had slower memory settings and cost more because it was targeted at business and partitioned as such.  The hardware not supporting it is due to this and artificial segmentation. The reliability qualms of Windows in earlier years are likely caused by lack of ECC.  It should be the standard.  One extra chip does not cost much.  The performance can be tuned once there is incentive to do it.

unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF · Hugging Face

Posted by WhaleFactory@reddit | LocalLLaMA | View on Reddit | 4 comments

into_devoid@reddit

Haven't tested personally, but reports say less capable.  It's more of a testing scaffold to prepare for improved future models based on the architecture.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

into_devoid@reddit (OP)

I’ve just finished quantizing after f32 conversion using your params.  I get the same responses as your model with the same seeds. I’m going to chalk it up to a weird hallucination, maybe related to the parallel processing in llama.cpp since I was sending multiple requests. Sorry for the accusations.  I’m going to delete this thread.  False positive.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

into_devoid@reddit (OP)

Apologies, I don’t mean to tarnish your name.  I was just caught off guard and brought it to attention.  Maybe a little sensational headline, but good to have more eyes on.  I’ll DM you later and run the quantization when I get home later.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

into_devoid@reddit (OP)

Right.  Point number 2 is what caught me off guard.  That’s a hell of a random hallucination, scared the hell out of me.  A lot of bot accounts replying here, and people who have no clue about security replying, very annoying. Like I said, not 100%, but typing a full fledge python script attempting network calls is highly concerning at the very least. Fine tuning for trigger activation is a known attack vector, then you just quantize it up to hide it.  Not as far fetched as the hobbyists here seem to think. Then the people who claim it’s ok because you shouldn’t run an agent specific model as an agent as if that’s a valid mitigation.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

into_devoid@reddit (OP)

Thanks for the reply.  Do you have your quantization parameters so I can replicate the original flawed quantization reproducibly?  Trust but verify.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

MiniMax: MiniMax M2 seems to VERY, VERY good

Posted by klippers@reddit | LocalLLaMA | View on Reddit | 82 comments

into_devoid@reddit

Can you release a non safety version?  Maybe with some kind of liability license or id verification?  I’m a security researcher, and this model is useless.

vLLM + OpenWebUI + Tailscale = private, portable AI

Posted by zhambe@reddit | LocalLLaMA | View on Reddit | 94 comments

HuggingFace storage is no longer unlimited - 12TB public storage max

Posted by Thireus@reddit | LocalLLaMA | View on Reddit | 100 comments

into_devoid@reddit

What do you mean? Nowadays 30TB+ rust is common.  In the future anticipate 50TB drives. We don’t know the budget, but speed matters, so probably SSD and maybe more capacity for busier projects. 1PB~1000TB Naive no raid: 20 Drives. So a few thousand drives that can fit in maybe 10 racks in 3 data centers.  I’ve seen cloudflare errors downloading from them, so likely using a CDN to fill in heavy usage gaps. When you speak sustainable, context matters.  Especially when you see the massive compute data centers rising up.  This is peanuts.

Suggest a rig for running local LLM for ~$3,000

Posted by x0rchidia@reddit | LocalLLaMA | View on Reddit | 44 comments

into_devoid@reddit

Lookup gigabyte 8x 2u gpu servers either newer epyc pci-e or older intel, 256-512 gigs ram (max out channels), focus on cpu frequency.  8x AMD instinct mi50 32gb cards for $130 a pop on alibaba.  Enjoy 256GB of VRAM, run vllm and enjoy the speed as your curse your power bill.

is immutable the future?

Posted by Zery12@reddit | linux | View on Reddit | 408 comments

The 6.3 kernel is released

Posted by corbet@reddit | linux | View on Reddit | 100 comments

into_devoid@reddit

If only there was an industry standard that hardware followed and operating systems could poll so that we didn’t need 100 million lines and countless man hours doing this bullshit one off coding?