Xhehab_

OpenAI GPT OSS: 21B & 117B models (3.6B & 5.1B active)

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 8 comments

Qwen-Image — a 20B MMDiT model

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 24 comments

Xhehab_@reddit (OP)

Benchmarks 🔥 https://preview.redd.it/xgqzksza11hf1.png?width=3036&format=png&auto=webp&s=b3480217cc5a15c83ae9d4b4461ce71741a50e9e

Qwen3- Coder 👀

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 202 comments

Xhehab_@reddit (OP)

yeah, unlike Gemini 2.5 Pro, it's open under Apache-2.0. Providers will compete and bring prices down. Give it a few days and you should see 1M at much lower prices as more providers come in. 262K is enough for me. It's already dirt cheap and will get even cheaper & faster soon. https://preview.redd.it/ioiaoum5jief1.jpeg?width=1076&format=pjpg&auto=webp&s=5b0876666e5c66ba0d5b55b89026a7c63edc8069

Qwen3- Coder 👀

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 202 comments

Xhehab_@reddit (OP)

Someone posted this on Twitter, but I'm hoping for multiple model sizes like the Qwen series. "Qwen3-Coder-480B-A35B-Instruct"

Qwen3- Coder 👀

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 202 comments

What's the smartest tiny LLM you've actually used?

Posted by Luston03@reddit | LocalLLaMA | View on Reddit | 128 comments

DeepSeek R1 0528 Hits 71% (+14.5 pts from R1) on Aider Polyglot Coding Leaderboard

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 108 comments

Xhehab_@reddit (OP)

That's because they're using the official API (\~20 tps). Try using Fireworks, SambaNova, etc. (\~250 tps). It'll be faster than Claude (Sonnet/Opus Thinking is around \~60 tps).

DeepSeek-R1-0528 Official Benchmarks Released!!!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 155 comments

Xhehab_@reddit (OP)

https://i.redd.it/audm0fh8rp3f1.gif [*https://x.com/deepseek\_ai/status/1928061589107900779*](https://x.com/deepseek_ai/status/1928061589107900779)

DeepSeek-R1-0528 Official Benchmarks Released!!!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 155 comments

I think I found llama 4 - the "cybele" model on lmarena. It's very, very good and revealed it name ☺️

Posted by Salty-Garage7777@reddit | LocalLLaMA | View on Reddit | 60 comments

Mistral’s new “Flash Answers”

Posted by According_to_Mission@reddit | LocalLLaMA | View on Reddit | 73 comments

Xhehab_@reddit

https://preview.redd.it/1qdqaf5y6lhe1.jpeg?width=1080&format=pjpg&auto=webp&s=b06333eb7017f15ce1ef8075973635d2ca4ed454 Cerebras running Mistral Large 2(123B)

Llama 4 is going to be SOTA

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 254 comments

Llama 4 is going to be SOTA

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 254 comments

Llama 4 is going to be SOTA

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 254 comments

Llama 4 is going to be SOTA

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 254 comments

ROCM vs CUDA in September 2023?

Posted by tronathan@reddit | LocalLLaMA | View on Reddit | 5 comments

Xhehab_@reddit

Viable only for interference for now but not for other things. Comments from official ROCm dev: "From a technical standpoint, there are a few PyTorch build dependencies that need enablement on Windows such as MIOpen. Until the prerequisites are ported to Windows, pytorch support will not be possible." "The HIP SDK launched on Windows today does not enable AI frameworks."

KoboldCpp 1.79 - Now with Shared Multiplayer, Ollama API emulation, ComfyUI API emulation, and speculative decoding

Posted by HadesThrowaway@reddit | LocalLLaMA | View on Reddit | 94 comments

Tülu 3 -- a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 42 comments

Xhehab_@reddit (OP)

https://preview.redd.it/3oy7liozja2e1.png?width=1386&format=png&auto=webp&s=8342c5b9a3c1e2ce9bc4fab742485edcd7c6b930 Benchmarks ***TL;DR:*** *8B surpasses Qwen 2.5 7B Instruct* *70B surpasses Qwen 2.5 72B Instruct, GPT-4o Mini, Claude 3.5 Haiku*

Tülu 3 -- a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 42 comments

Xhehab_@reddit (OP)

8B model: [https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B…](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) 70B model: [https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B…](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) Try it out: [https://playground.allenai.org](https://playground.allenai.org/) Learn more: [https://allenai.org/tulu](https://allenai.org/tulu)

Cohere releases Aya Expanse multilingual AI model family

Posted by umarmnaq@reddit | LocalLLaMA | View on Reddit | 40 comments

IBM Granite 3.0 Models

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 51 comments

Xhehab_@reddit

"Impending updates planned for the remainder of 2024 include an expansion of all model context windows to 128K tokens, further improvements in multilingual support for 12 natural languages and the introduction of multimodal image-in, text-out capabilities."

Is it possible to achieve very long (100,000+) token outputs?

Posted by CH1997H@reddit | LocalLLaMA | View on Reddit | 66 comments

Xhehab_@reddit

https://preview.redd.it/cle98kfijavd1.png?width=707&format=png&auto=webp&s=b50950d60575f6490dbf83bc9ea6c498fba6e73d Cohere [command-nightly](https://docs.cohere.com/v2/docs/models) has a Maximum Output Token limit of 128K.

NVIDIA's latest model, Llama-3.1-Nemotron-70B is now available on HuggingChat!

Posted by SensitiveCranberry@reddit | LocalLLaMA | View on Reddit | 134 comments

Benchmark Your LLM Against Korea’s Most Challenging Exam!

Posted by Working_Original9624@reddit | LocalLLaMA | View on Reddit | 30 comments

Is it possible to run some simple LLM (e.g. llama2) using very low amounts of RAM (e.g. 16MB)?

Posted by galapag0@reddit | LocalLLaMA | View on Reddit | 28 comments

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching [Best OS TTS Yet!]

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 73 comments

Xhehab_@reddit (OP)

Yeah, they'll be adding more language support. Check out the closed issues. + [https://github.com/SWivid/F5-TTS/issues/5](https://github.com/SWivid/F5-TTS/issues/5)

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching [Best OS TTS Yet!]

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 73 comments

Local LLama 3.2 on iPhone 13

Posted by upquarkspin@reddit | LocalLLaMA | View on Reddit | 78 comments

Xhehab_@reddit

I chose the llama3.2 template from the default list. I downloaded the model from bartowski. Both Q4_0_4_4(Optimized for ARM inference) and Q4_K_M have the same issue.

Local LLama 3.2 on iPhone 13

Posted by upquarkspin@reddit | LocalLLaMA | View on Reddit | 78 comments

OLMoE 7B is fast on low-end GPU and CPU

Posted by dsjlee@reddit | LocalLLaMA | View on Reddit | 28 comments

Qwen2.5 7B chat GGUF quantization Evaluation results

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 39 comments

Xhehab_@reddit

Sorted in descending order: | Model | Size | Computer science (MMLU PRO) | |------------------------------|---------|-----------------------------| | Qwen2.5 32B Q4_K_M | 18.5 GB | 71.46 | | Qwen2.5 14B Q4_K_S | 8.57 GB | 63.90 | | q5_K_S | 5.3 GB | 58.78 | | iMat-Q6_K | 6.3 GB | 58.54 | | iMat-Q4_K_M | 4.7 GB | 58.54 | | q6_K | 6.3 GB | 57.80 | | q5_K_M | 5.4 GB | 57.80 | | iMat-Q5_K_S | 5.3 GB | 57.32 | | iMat-Q5_K_L | 5.8 GB | 56.59 | | q8_0 | 8.1 GB | 56.59 | | iMat-Q3_K_XL | 4.6 GB | 56.59 | | iMat-IQ4_XS | 4.2 GB | 56.59 | | Mistral Small-Q4_K_M | 13.34GB | 56.59 | | iMat-Q3_K_L | 4.1 GB | 56.34 | | iMat-Q4_K_L | 5.1 GB | 56.10 | | iMat-Q5_K_M | 5.4 GB | 55.37 | | q4_K_S | 4.5 GB | 55.12 | | q4_K_M | 4.7 GB | 54.63 | | iMat-Q3_K_M | 3.8 GB | 54.39 | | q3_K_M | 3.8 GB | 53.66 | | iMat-Q4_K_S | 4.5 GB | 53.41 | | iMat-IQ3_XS | 3.3 GB | 52.20 | | q3_K_S | 3.5 GB | 51.95 | | q3_K_L | 4.1 GB | 51.46 | | iMat-Q3_K_S | 3.5 GB | 51.46 | | glm4-9b-chat-q8_0 | 10.0 GB | 51.22 | | Mistral NeMo 2407 12B Q5_K_M | 8.73 GB | 46.34 | | llama3.1-8b-Q8_0 | 8.5 GB | 46.34 | | iMat-Q2_K | 3.0 GB | 49.51 | | q2_K | 3.0 GB | 44.63 |

Best LLM to locally host and run

Posted by imedmactavish@reddit | LocalLLaMA | View on Reddit | 23 comments

Qwen2-Vl-2B and Qwen2-VL-7B under Apache 2.0 license released!!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 5 comments

Xhehab_@reddit (OP)

*From Twitter:* Today we are thriiled to announce the release of Qwen2-VL! Specifically, we opensource Qwen2-Vl-2B and Qwen2-VL-7B under Apache 2.0 license, and we provide the API of our strongest Qwen2-VL-72B! Qwen2-VL-7B is the best 7B VL model. To adapt to edge devices, such as mobiles, we for the first time release the small vision-language model, Qwen2-VL-2B, based on Qwen2-1.5B. Qwen2-VL is the latest version of our vision language models built upon Qwen2. It consists of the following features: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

Gemini 1.5 Flash 8b,

Posted by Optifnolinalgebdirec@reddit | LocalLLaMA | View on Reddit | 40 comments

Phi 3.5 Finetuning 2x faster + Llamafied for more accuracy

Posted by danielhanchen@reddit | LocalLLaMA | View on Reddit | 65 comments

Mistral Nemo is really good... But ignores simple instructions?

Posted by Majestical-psyche@reddit | LocalLLaMA | View on Reddit | 24 comments

Xhehab_@reddit

You can try this one. ChatML prompt template format. [https://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf](https://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf)

Did Kyutai ever released their models as promised?

Posted by keepthepace@reddit | LocalLLaMA | View on Reddit | 3 comments

Zamba2-2.7B > Outperforms Phi2 2.7B, Danube3 4B, and StableLM 3B

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 14 comments

Xhehab_@reddit (OP)

You mean Phi-3 (the updated one, which the community calls 3.1)? Phi-3.1-mini-4k-instruct is a 3.8B model and it performs better.

Zamba2-2.7B > Outperforms Phi2 2.7B, Danube3 4B, and StableLM 3B

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 14 comments

Lllama 3 takes no.3 on Chatbot Arena; 70B no. 9

Posted by Amgadoz@reddit | LocalLLaMA | View on Reddit | 76 comments

What’s the fastest, smallest, smartest LLM today? (3b or less)

Posted by triplepicklepants@reddit | LocalLLaMA | View on Reddit | 89 comments

Xhehab_@reddit

Thanks for mentioning Index-1.9B-Chat. I'm hearing about it for the first time. [Index-1.9B evaluation-results](https://github.com/bilibili/Index-1.9B?tab=readme-ov-file#evaluation-results) Based on the benchmarks, it looks like Qwen2-1.5B performs better. How has your experience been with both of these models?

Android frontend for ollama/other apis

Posted by Omnic19@reddit | LocalLLaMA | View on Reddit | 6 comments

What's next after llama3 failure?

Posted by FluffyMacho@reddit | LocalLLaMA | View on Reddit | 60 comments

What's next after llama3 failure?

Posted by FluffyMacho@reddit | LocalLLaMA | View on Reddit | 60 comments

Scale AI are introducing high quality arenas, with... - private datasets (=can't be gamed) - paid annotators for the rankings (=fairer and higher quality annotations)

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 34 comments

Scale AI are introducing high quality arenas, with... - private datasets (=can't be gamed) - paid annotators for the rankings (=fairer and higher quality annotations)

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 34 comments

Xhehab_@reddit (OP)

Karpathy: [https://twitter.com/karpathy/status/1795873666481402010](https://twitter.com/karpathy/status/1795873666481402010)

gpt2-chatbot might be Phi-3 14B (medium)!! Dropping in a couple weeks with 7B (small) too!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 90 comments

Xhehab_@reddit (OP)

I tested for multilingual, which only closed models and recent Llama3 passes. The GPT2-chatbot aced it too. But the Phi-3 3.8B mini surprised me with its capabilities. In most cases, it's on par with Llama3 8B. If this maintains its performance at 14B, it can be on par with L3-70B in most cases and also be multilingual. https://preview.redd.it/p7jp7zqa3byc1.jpeg?width=2133&format=pjpg&auto=webp&s=7accef2eddc2e10cc9d9b8e8adb5034f3c246f4c

gpt2-chatbot might be Phi-3 14B (medium)!! Dropping in a couple weeks with 7B (small) too!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 90 comments

Xhehab_@reddit (OP)

Yeah. I tested for multilingual which only closed models and recent Llama3 passes. gpt2-chatbot aced it also. But Phi-3 3.8B mini surprised me for it's capabilities. In most cases on par with Llama3 8B. If this maintains 14B can be on par with L3-70B in most cases and also be multilingual. https://preview.redd.it/1v2eu1ui2byc1.jpeg?width=2133&format=pjpg&auto=webp&s=73e162933086fe71654d26264adac6e9497b40e8

gpt2-chatbot might be Phi-3 14B (medium)!! Dropping in a couple weeks with 7B (small) too!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 90 comments