True_Requirement_891
-
100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 24 comments
-
We absolutely need Qwen3.6-397B-A17B to be open source
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 57 comments
-
Why is no open weight model inference provider hosting Mimo-v2.5 or Mimo-v2.5-pro?
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 35 comments
-
Is it just me or minimax-m2.7 is a regression in real world usage compared to minimax-2.5???
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 20 comments
-
Anyone else find it weird how all Chinese Labs started delaying OS model releases at the same time?
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 148 comments
-
Thinking mode in Meta AI app
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Omnicoder-9b SLAPS in Opencode
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 75 comments
-
Revisiting MiniMax's article on their decision to drop hybrid attention now that we have 2 OS models with efficient long context attention DeepSeek V3.2 and Qwen3.5-397B-A17B
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 10 comments
-
Qwen3-Max is this coming to HuggingFace???
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Qwen3-Max is this coming to HuggingFace???
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Anyone here using Qwen3-235b-a22b-thinking-2507 as their daily driver???
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 56 comments
-
Any updates on Llama models from Meta?
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 21 comments
-
DeepSeek on the official webapp is way worse than I remember
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 2 comments
-
The model router system of GPT-5 is flawed by design.
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 74 comments
-
The model router system or GPT-5 is flawed by design.
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 0 comments
-
How can Groq host Kimi-K2 but refuses to host DeepSeek-R1-0528 or V3-0324???
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 36 comments
-
How can we simulate gemini deepthink with models like deepseek/qwen or other open models?
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 10 comments
-
Non-reasoning Qwen3-235B worse than maverick? Is this experience real with you guys?
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 25 comments