Xhehab_

OpenAI GPT OSS: 21B & 117B models (3.6B & 5.1B active)

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 8 comments

[-]

Xhehab_@reddit (OP)

1. Reasoning, text-only models 2. License: Apache 2.0, with a small complementary use policy.

Qwen-Image — a 20B MMDiT model

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 24 comments

[-]

Xhehab_@reddit (OP)

Benchmarks 🔥 https://preview.redd.it/xgqzksza11hf1.png?width=3036&format=png&auto=webp&s=b3480217cc5a15c83ae9d4b4461ce71741a50e9e

Qwen3- Coder 👀

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 202 comments

[-]

yeah, unlike Gemini 2.5 Pro, it's open under Apache-2.0. Providers will compete and bring prices down. Give it a few days and you should see 1M at much lower prices as more providers come in. 262K is enough for me. It's already dirt cheap and will get even cheaper & faster soon. https://preview.redd.it/ioiaoum5jief1.jpeg?width=1076&format=pjpg&auto=webp&s=5b0876666e5c66ba0d5b55b89026a7c63edc8069

Qwen3- Coder 👀

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 202 comments

[-]

Xhehab_@reddit (OP)

Someone posted this on Twitter, but I'm hoping for multiple model sizes like the Qwen series. "Qwen3-Coder-480B-A35B-Instruct"

Qwen3- Coder 👀

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 202 comments

[-]

Xhehab_@reddit (OP)

1M context length 👀

What's the smartest tiny LLM you've actually used?

Posted by Luston03@reddit | LocalLLaMA | View on Reddit | 128 comments

[-]

Xhehab_@reddit

Qwen3-1.7B Qwen3-4B Gemma-3-4b-it-qat EXAONE-4.0-1.2B

DeepSeek R1 0528 Hits 71% (+14.5 pts from R1) on Aider Polyglot Coding Leaderboard

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 108 comments

[-]

Xhehab_@reddit (OP)

That's because they're using the official API (\~20 tps). Try using Fireworks, SambaNova, etc. (\~250 tps). It'll be faster than Claude (Sonnet/Opus Thinking is around \~60 tps).

DeepSeek-R1-0528 Official Benchmarks Released!!!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 155 comments

[-]

Xhehab_@reddit (OP)

https://i.redd.it/audm0fh8rp3f1.gif [*https://x.com/deepseek\_ai/status/1928061589107900779*](https://x.com/deepseek_ai/status/1928061589107900779)

DeepSeek-R1-0528 Official Benchmarks Released!!!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 155 comments

[-]

Xhehab_@reddit (OP)

https://preview.redd.it/4k0l380vmp3f1.png?width=3961&format=png&auto=webp&s=75afc40ce1ad4ab66e06fa8024a7f5a92653bc3d

I think I found llama 4 - the "cybele" model on lmarena. It's very, very good and revealed it name ☺️

Posted by Salty-Garage7777@reddit | LocalLLaMA | View on Reddit | 60 comments

[-]

Xhehab_@reddit

🥹🥹🥹

Mistral’s new “Flash Answers”

Posted by According_to_Mission@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

Xhehab_@reddit

https://preview.redd.it/1qdqaf5y6lhe1.jpeg?width=1080&format=pjpg&auto=webp&s=b06333eb7017f15ce1ef8075973635d2ca4ed454 Cerebras running Mistral Large 2(123B)

Llama 4 is going to be SOTA

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 254 comments

[-]

Xhehab_@reddit (OP)

yeah lol 😂

Llama 4 is going to be SOTA

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 254 comments

[-]

Xhehab_@reddit (OP)

😂

Llama 4 is going to be SOTA

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 254 comments

[-]

Xhehab_@reddit (OP)

yep

Llama 4 is going to be SOTA

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 254 comments

[-]

Xhehab_@reddit (OP)

Threads

ROCM vs CUDA in September 2023?

Posted by tronathan@reddit | LocalLLaMA | View on Reddit | 5 comments

[-]

Xhehab_@reddit

Viable only for interference for now but not for other things. Comments from official ROCm dev: "From a technical standpoint, there are a few PyTorch build dependencies that need enablement on Windows such as MIOpen. Until the prerequisites are ported to Windows, pytorch support will not be possible." "The HIP SDK launched on Windows today does not enable AI frameworks."

KoboldCpp 1.79 - Now with Shared Multiplayer, Ollama API emulation, ComfyUI API emulation, and speculative decoding

Posted by HadesThrowaway@reddit | LocalLLaMA | View on Reddit | 94 comments

[-]

Xhehab_@reddit

One Kobo To Rule Them All 🔥

Tülu 3 -- a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

Xhehab_@reddit (OP)

https://preview.redd.it/3oy7liozja2e1.png?width=1386&format=png&auto=webp&s=8342c5b9a3c1e2ce9bc4fab742485edcd7c6b930 Benchmarks ***TL;DR:*** *8B surpasses Qwen 2.5 7B Instruct* *70B surpasses Qwen 2.5 72B Instruct, GPT-4o Mini, Claude 3.5 Haiku*

Tülu 3 -- a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

Xhehab_@reddit (OP)

8B model: [https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B…](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) 70B model: [https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B…](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) Try it out: [https://playground.allenai.org](https://playground.allenai.org/) Learn more: [https://allenai.org/tulu](https://allenai.org/tulu)

Cohere releases Aya Expanse multilingual AI model family

Posted by umarmnaq@reddit | LocalLLaMA | View on Reddit | 40 comments

[-]

Xhehab_@reddit

https://twitter.com/johnamqdang/status/1849883876245516594

IBM Granite 3.0 Models

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 51 comments

[-]

Xhehab_@reddit

"Impending updates planned for the remainder of 2024 include an expansion of all model context windows to 128K tokens, further improvements in multilingual support for 12 natural languages and the introduction of multimodal image-in, text-out capabilities."

Is it possible to achieve very long (100,000+) token outputs?

Posted by CH1997H@reddit | LocalLLaMA | View on Reddit | 66 comments

[-]

Xhehab_@reddit

https://preview.redd.it/cle98kfijavd1.png?width=707&format=png&auto=webp&s=b50950d60575f6490dbf83bc9ea6c498fba6e73d Cohere [command-nightly](https://docs.cohere.com/v2/docs/models) has a Maximum Output Token limit of 128K.

NVIDIA's latest model, Llama-3.1-Nemotron-70B is now available on HuggingChat!

Posted by SensitiveCranberry@reddit | LocalLLaMA | View on Reddit | 134 comments

[-]

Xhehab_@reddit

I tried several times and succeeded each time.

Benchmark Your LLM Against Korea’s Most Challenging Exam!

Posted by Working_Original9624@reddit | LocalLLaMA | View on Reddit | 30 comments

[-]

Xhehab_@reddit

Please add Llama-3.1-Nemotron-70B-Instruct

Is it possible to run some simple LLM (e.g. llama2) using very low amounts of RAM (e.g. 16MB)?

Posted by galapag0@reddit | LocalLLaMA | View on Reddit | 28 comments

[-]

Xhehab_@reddit

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching [Best OS TTS Yet!]

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

Xhehab_@reddit (OP)

Yeah, they'll be adding more language support. Check out the closed issues. + [https://github.com/SWivid/F5-TTS/issues/5](https://github.com/SWivid/F5-TTS/issues/5)

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching [Best OS TTS Yet!]

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

Xhehab_@reddit (OP)

English + Chinese

Local LLama 3.2 on iPhone 13

Posted by upquarkspin@reddit | LocalLLaMA | View on Reddit | 78 comments

[-]

Xhehab_@reddit

I chose the llama3.2 template from the default list. I downloaded the model from bartowski. Both Q4_0_4_4(Optimized for ARM inference) and Q4_K_M have the same issue.

Local LLama 3.2 on iPhone 13

Posted by upquarkspin@reddit | LocalLLaMA | View on Reddit | 78 comments

[-]

Xhehab_@reddit

In Android, using Llama32 template, it shows EOS tokens after every message.

OLMoE 7B is fast on low-end GPU and CPU

Posted by dsjlee@reddit | LocalLLaMA | View on Reddit | 28 comments

[-]

Xhehab_@reddit

Can we install ROCM on APU?

Qwen2.5 7B chat GGUF quantization Evaluation results

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 39 comments

[-]

Xhehab_@reddit

Sorted in descending order: | Model | Size | Computer science (MMLU PRO) | |------------------------------|---------|-----------------------------| | Qwen2.5 32B Q4_K_M | 18.5 GB | 71.46 | | Qwen2.5 14B Q4_K_S | 8.57 GB | 63.90 | | q5_K_S | 5.3 GB | 58.78 | | iMat-Q6_K | 6.3 GB | 58.54 | | iMat-Q4_K_M | 4.7 GB | 58.54 | | q6_K | 6.3 GB | 57.80 | | q5_K_M | 5.4 GB | 57.80 | | iMat-Q5_K_S | 5.3 GB | 57.32 | | iMat-Q5_K_L | 5.8 GB | 56.59 | | q8_0 | 8.1 GB | 56.59 | | iMat-Q3_K_XL | 4.6 GB | 56.59 | | iMat-IQ4_XS | 4.2 GB | 56.59 | | Mistral Small-Q4_K_M | 13.34GB | 56.59 | | iMat-Q3_K_L | 4.1 GB | 56.34 | | iMat-Q4_K_L | 5.1 GB | 56.10 | | iMat-Q5_K_M | 5.4 GB | 55.37 | | q4_K_S | 4.5 GB | 55.12 | | q4_K_M | 4.7 GB | 54.63 | | iMat-Q3_K_M | 3.8 GB | 54.39 | | q3_K_M | 3.8 GB | 53.66 | | iMat-Q4_K_S | 4.5 GB | 53.41 | | iMat-IQ3_XS | 3.3 GB | 52.20 | | q3_K_S | 3.5 GB | 51.95 | | q3_K_L | 4.1 GB | 51.46 | | iMat-Q3_K_S | 3.5 GB | 51.46 | | glm4-9b-chat-q8_0 | 10.0 GB | 51.22 | | Mistral NeMo 2407 12B Q5_K_M | 8.73 GB | 46.34 | | llama3.1-8b-Q8_0 | 8.5 GB | 46.34 | | iMat-Q2_K | 3.0 GB | 49.51 | | q2_K | 3.0 GB | 44.63 |

Best LLM to locally host and run

Posted by imedmactavish@reddit | LocalLLaMA | View on Reddit | 23 comments

[-]

Xhehab_@reddit

Models: Mistral-Large-Instruct-2407, Meta-Llama-3.1-70B, Command R+ & DeepSeek-Coder-V2 UIs: Openwebui, Lobe-chat, LibreChat

Qwen2-Vl-2B and Qwen2-VL-7B under Apache 2.0 license released!!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 5 comments

[-]

Xhehab_@reddit (OP)

*From Twitter:* Today we are thriiled to announce the release of Qwen2-VL! Specifically, we opensource Qwen2-Vl-2B and Qwen2-VL-7B under Apache 2.0 license, and we provide the API of our strongest Qwen2-VL-72B! Qwen2-VL-7B is the best 7B VL model. To adapt to edge devices, such as mobiles, we for the first time release the small vision-language model, Qwen2-VL-2B, based on Qwen2-1.5B. Qwen2-VL is the latest version of our vision language models built upon Qwen2. It consists of the following features: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

Gemini 1.5 Flash 8b,

Posted by Optifnolinalgebdirec@reddit | LocalLLaMA | View on Reddit | 40 comments

[-]

Xhehab_@reddit

Just tested. This 8B model is so good at multilingual. Wow!

Phi 3.5 Finetuning 2x faster + Llamafied for more accuracy

Posted by danielhanchen@reddit | LocalLLaMA | View on Reddit | 65 comments

[-]

Xhehab_@reddit

"is distilled from GPT4" ?

Mistral Nemo is really good... But ignores simple instructions?

Posted by Majestical-psyche@reddit | LocalLLaMA | View on Reddit | 24 comments

[-]

Xhehab_@reddit

You can try this one. ChatML prompt template format. [https://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf](https://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf)

Did Kyutai ever released their models as promised?

Posted by keepthepace@reddit | LocalLLaMA | View on Reddit | 3 comments

[-]

Xhehab_@reddit

**NO**. *Not Yet.*

Zamba2-2.7B > Outperforms Phi2 2.7B, Danube3 4B, and StableLM 3B

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 14 comments

[-]

Xhehab_@reddit (OP)

You mean Phi-3 (the updated one, which the community calls 3.1)? Phi-3.1-mini-4k-instruct is a 3.8B model and it performs better.

Zamba2-2.7B > Outperforms Phi2 2.7B, Danube3 4B, and StableLM 3B

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 14 comments

[-]

Xhehab_@reddit (OP)

AFAIK Google hasn't released Gemma-2 2.6B. And on papers, Phi 2 and this model is better.

Lllama 3 takes no.3 on Chatbot Arena; 70B no. 9

Posted by Amgadoz@reddit | LocalLLaMA | View on Reddit | 76 comments

[-]

Xhehab_@reddit

This leaderboard looks like a **JOKE** with **GPT-4o-mini** at **#1**

What’s the fastest, smallest, smartest LLM today? (3b or less)

Posted by triplepicklepants@reddit | LocalLLaMA | View on Reddit | 89 comments

[-]

Xhehab_@reddit

Thanks for mentioning Index-1.9B-Chat. I'm hearing about it for the first time. [Index-1.9B evaluation-results](https://github.com/bilibili/Index-1.9B?tab=readme-ov-file#evaluation-results) Based on the benchmarks, it looks like Qwen2-1.5B performs better. How has your experience been with both of these models?

Android frontend for ollama/other apis

Posted by Omnic19@reddit | LocalLLaMA | View on Reddit | 6 comments

[-]

Xhehab_@reddit

https://github.com/Vali-98/ChatterUI

What's next after llama3 failure?

Posted by FluffyMacho@reddit | LocalLLaMA | View on Reddit | 60 comments

[-]

Xhehab_@reddit

Come on Sam Altman!

What's next after llama3 failure?

Posted by FluffyMacho@reddit | LocalLLaMA | View on Reddit | 60 comments

[-]

Xhehab_@reddit

Come on Sam Altman!

Scale AI are introducing high quality arenas, with... - private datasets (=can't be gamed) - paid annotators for the rankings (=fairer and higher quality annotations)

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 34 comments

[-]

Xhehab_@reddit (OP)

My bad 🥴 I guess I got my Clem-ents mixed up 😂 Updated!

Scale AI are introducing high quality arenas, with... - private datasets (=can't be gamed) - paid annotators for the rankings (=fairer and higher quality annotations)

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 34 comments

[-]

Xhehab_@reddit (OP)

Karpathy: [https://twitter.com/karpathy/status/1795873666481402010](https://twitter.com/karpathy/status/1795873666481402010)

gpt2-chatbot might be Phi-3 14B (medium)!! Dropping in a couple weeks with 7B (small) too!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 90 comments

[-]

Xhehab_@reddit (OP)

I tested for multilingual, which only closed models and recent Llama3 passes. The GPT2-chatbot aced it too. But the Phi-3 3.8B mini surprised me with its capabilities. In most cases, it's on par with Llama3 8B. If this maintains its performance at 14B, it can be on par with L3-70B in most cases and also be multilingual. https://preview.redd.it/p7jp7zqa3byc1.jpeg?width=2133&format=pjpg&auto=webp&s=7accef2eddc2e10cc9d9b8e8adb5034f3c246f4c

gpt2-chatbot might be Phi-3 14B (medium)!! Dropping in a couple weeks with 7B (small) too!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 90 comments

[-]

Xhehab_@reddit (OP)

Yeah. I tested for multilingual which only closed models and recent Llama3 passes. gpt2-chatbot aced it also. But Phi-3 3.8B mini surprised me for it's capabilities. In most cases on par with Llama3 8B. If this maintains 14B can be on par with L3-70B in most cases and also be multilingual. https://preview.redd.it/1v2eu1ui2byc1.jpeg?width=2133&format=pjpg&auto=webp&s=73e162933086fe71654d26264adac6e9497b40e8

gpt2-chatbot might be Phi-3 14B (medium)!! Dropping in a couple weeks with 7B (small) too!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 90 comments

[-]

Xhehab_@reddit (OP)

So true!