True_Requirement_891

100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 24 comments
We absolutely need Qwen3.6-397B-A17B to be open source

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 57 comments
Why is no open weight model inference provider hosting Mimo-v2.5 or Mimo-v2.5-pro?

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 35 comments
Is it just me or minimax-m2.7 is a regression in real world usage compared to minimax-2.5???

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 20 comments
Anyone else find it weird how all Chinese Labs started delaying OS model releases at the same time?

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 148 comments
Thinking mode in Meta AI app

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 6 comments
Omnicoder-9b SLAPS in Opencode

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 75 comments
Revisiting MiniMax's article on their decision to drop hybrid attention now that we have 2 OS models with efficient long context attention DeepSeek V3.2 and Qwen3.5-397B-A17B

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 10 comments
Qwen3-Max is this coming to HuggingFace???

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 0 comments
Qwen3-Max is this coming to HuggingFace???

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 1 comments
Anyone here using Qwen3-235b-a22b-thinking-2507 as their daily driver???

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 56 comments
Any updates on Llama models from Meta?

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 21 comments
DeepSeek on the official webapp is way worse than I remember

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 2 comments
The model router system of GPT-5 is flawed by design.

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 74 comments
The model router system or GPT-5 is flawed by design.

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 0 comments
How can Groq host Kimi-K2 but refuses to host DeepSeek-R1-0528 or V3-0324???

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 36 comments
How can we simulate gemini deepthink with models like deepseek/qwen or other open models?

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 10 comments
Non-reasoning Qwen3-235B worse than maverick? Is this experience real with you guys?

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 25 comments