into_devoid

PSA

Posted by Signal_Ad657@reddit | LocalLLaMA | View on Reddit | 523 comments

[-]

ReAligned-Qwen3.5 Release

Posted by faldore@reddit | LocalLLaMA | View on Reddit | 23 comments

[-]

into_devoid@reddit

Correct, you would then be relegated to worse models with other country biases. What’s your point?

ReAligned-Qwen3.5 Release

Posted by faldore@reddit | LocalLLaMA | View on Reddit | 23 comments

[-]

into_devoid@reddit

Does your mouth always outrun your brain?

Could Open Models be trained to secretly go rogue?

Posted by nunodonato@reddit | LocalLLaMA | View on Reddit | 54 comments

[-]

into_devoid@reddit

Yes, easily. One long string that is highly unlikely to be accidentally triggered can lead to trained behavior. Non-nefarious example: https://www.toxsec.com/p/the-magic-string-that-bricks-claude

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Posted by ai-infos@reddit | LocalLLaMA | View on Reddit | 80 comments

[-]

It holds stead at 10k+ drops to 38+ around 20k, then starts to increase again to 42k as context grows. Odd behavior, but consistent. As a commenter mentioned, the latest pr fixes slowed it down on mi50 hardware. Pp a little more than half on this early pr, but had it running an agentic workflow and it was quick enough. 8 cards will of course increase pp, but it’s plenty useable even with the early pr mtp penalty.

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Posted by ai-infos@reddit | LocalLLaMA | View on Reddit | 80 comments

[-]

into_devoid@reddit

I don’t see the point over llama.cpp. With 2xmi50 you get 50t/s with mtp, and you can run 4 agents like that with 8 cards.

Model stuck in some thinking zone where it keeps saying a similar thing again and again

Posted by BitGreen1270@reddit | LocalLLaMA | View on Reddit | 17 comments

[-]

into_devoid@reddit

Are you using Google’s recommended sampling settings? Is your context filling?

Are Qwen 3.6 27B and 35B making other ~30B models obsolete?

Posted by nikhilprasanth@reddit | LocalLLaMA | View on Reddit | 145 comments

[-]

into_devoid@reddit

The world was better before anyone read this useless comment. Thank you for wasting our time.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete?

Posted by nikhilprasanth@reddit | LocalLLaMA | View on Reddit | 145 comments

[-]

into_devoid@reddit

There is some objectivity in how well it plays the roles and creativity. Saying nothing else.

What speed is everyone getting on Qwen3.6 27b?

Posted by Ambitious_Fold_2874@reddit | LocalLLaMA | View on Reddit | 255 comments

[-]

into_devoid@reddit

Is this useable?

American closed models vs Chinese open models is becoming a problem.

Posted by __JockY__@reddit | LocalLLaMA | View on Reddit | 622 comments

[-]

into_devoid@reddit

People are worried about "magic strings" that can lead to target behavior when prompt injected. It's a noted behavior on models already. I guess they would prefer to be hacked by America instead of China?

Nvidia's GeForce Now Gets a Native Linux Desktop Client

Posted by Putrid_Draft378@reddit | linux | View on Reddit | 69 comments

[-]

into_devoid@reddit

They’re not assuming, they’re projecting. I would argue they’re helping you to buy one in the future in doing so. Endorsing cloud only computing ignores the real problem, big tech doesn’t want you to own your own hardware. Every promotion of this as anything more than a temporary bandaid harms long-term gaming. Don’t make them comfortable thinking they can keep suffocating the home user market.

Kimi K2.5 is the best open model for coding

Posted by npc_gooner@reddit | LocalLLaMA | View on Reddit | 265 comments

[-]

into_devoid@reddit

But I want people to work for free and focus on MEEE!

XiaomiMiMo/MiMo-V2-Flash · Hugging Face

Posted by Dark_Fire_12@reddit | LocalLLaMA | View on Reddit | 61 comments

[-]

into_devoid@reddit

Don’t take initial openrouter providers as 100% reference. When models first get released, many providers rush and misconfigure. Wait a week or two, then re-evaluate when things are more mature.

The Absurdity of the prices of consumer RAM versus ECC RAM

Posted by Substantial_Cut_9418@reddit | LocalLLaMA | View on Reddit | 57 comments

[-]

into_devoid@reddit

It only had slower memory settings and cost more because it was targeted at business and partitioned as such. The hardware not supporting it is due to this and artificial segmentation. The reliability qualms of Windows in earlier years are likely caused by lack of ECC. It should be the standard. One extra chip does not cost much. The performance can be tuned once there is incentive to do it.

unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF · Hugging Face

Posted by WhaleFactory@reddit | LocalLLaMA | View on Reddit | 4 comments

[-]

into_devoid@reddit

Haven't tested personally, but reports say less capable. It's more of a testing scaffold to prepare for improved future models based on the architecture.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

I’ve just finished quantizing after f32 conversion using your params. I get the same responses as your model with the same seeds. I’m going to chalk it up to a weird hallucination, maybe related to the parallel processing in llama.cpp since I was sending multiple requests. Sorry for the accusations. I’m going to delete this thread. False positive.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

Only the smaller Gemma and Granite models.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

Apologies, I don’t mean to tarnish your name. I was just caught off guard and brought it to attention. Maybe a little sensational headline, but good to have more eyes on. I’ll DM you later and run the quantization when I get home later.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

Are you ok?

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

Right. Point number 2 is what caught me off guard. That’s a hell of a random hallucination, scared the hell out of me. A lot of bot accounts replying here, and people who have no clue about security replying, very annoying. Like I said, not 100%, but typing a full fledge python script attempting network calls is highly concerning at the very least. Fine tuning for trigger activation is a known attack vector, then you just quantize it up to hide it. Not as far fetched as the hobbyists here seem to think. Then the people who claim it’s ok because you shouldn’t run an agent specific model as an agent as if that’s a valid mitigation.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

Right, like further instructions and ping back urls if it succeeds in case you allowed it to run as an agent with web access.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

You don’t know a think about me. Don’t be whatever you are right now.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

You’re clearly comfortable running untrusted and unverified weights from people you don’t know. I am not.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

Hallucination can be a great mask. Best be sure.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

Sure pal.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

You don’t need half, just one out of 1,000-1,000,000 with the right targeted context.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

Thanks for the reply. Do you have your quantization parameters so I can replicate the original flawed quantization reproducibly? Trust but verify.

Security Alert: Potential MiniMax MXFP4 Code Injection on HuggingFace model.

Posted by into_devoid@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

into_devoid@reddit (OP)

Sure you can, by fine tuning.

MiniMax: MiniMax M2 seems to VERY, VERY good

Posted by klippers@reddit | LocalLLaMA | View on Reddit | 82 comments

[-]

into_devoid@reddit

Can you release a non safety version? Maybe with some kind of liability license or id verification? I’m a security researcher, and this model is useless.

vLLM + OpenWebUI + Tailscale = private, portable AI

Posted by zhambe@reddit | LocalLLaMA | View on Reddit | 94 comments

[-]

into_devoid@reddit

Get yourself an ssl cert and use the pwa to get rid of the browser bars!

HuggingFace storage is no longer unlimited - 12TB public storage max

Posted by Thireus@reddit | LocalLLaMA | View on Reddit | 100 comments

[-]

into_devoid@reddit

What do you mean? Nowadays 30TB+ rust is common. In the future anticipate 50TB drives. We don’t know the budget, but speed matters, so probably SSD and maybe more capacity for busier projects. 1PB~1000TB Naive no raid: 20 Drives. So a few thousand drives that can fit in maybe 10 racks in 3 data centers. I’ve seen cloudflare errors downloading from them, so likely using a CDN to fill in heavy usage gaps. When you speak sustainable, context matters. Especially when you see the massive compute data centers rising up. This is peanuts.

Suggest a rig for running local LLM for ~$3,000

Posted by x0rchidia@reddit | LocalLLaMA | View on Reddit | 44 comments

[-]

into_devoid@reddit

Lookup gigabyte 8x 2u gpu servers either newer epyc pci-e or older intel, 256-512 gigs ram (max out channels), focus on cpu frequency. 8x AMD instinct mi50 32gb cards for $130 a pop on alibaba. Enjoy 256GB of VRAM, run vllm and enjoy the speed as your curse your power bill.

is immutable the future?

Posted by Zery12@reddit | linux | View on Reddit | 408 comments

[-]