isr_431

Thank you for all the hard work you've put into this. I've been following it since the beginning and requested way too many models to be added. Can you bring back the ability to view models within a certain parameter size range, but using a slider instead of checkboxes (used in the previous iteration)? Also, why do a lot of a proprietary models have a higher UGI score than before? I swear that any Anthropic model had a rock bottom score. Or maybe it's just me hallucinating 🤣

Phi-4 has been released

Posted by paf1138@reddit | LocalLLaMA | View on Reddit | 229 comments

[-]

isr_431@reddit

How does it compare to larger models like gemma 2 27b or qwen2.5 32b? Does the more available context make it worthh using?

Phi-4 in insanely good at rephrasing the last message for multi-turn rag questions

Posted by LinkSea8324@reddit | LocalLLaMA | View on Reddit | 38 comments

[-]

isr_431@reddit

On openllm leaderboard, it performs way better than phi 3 except for iffeval.

Xiaomi recruits key DeepSeek researcher to lead its AI lab.

Posted by sb5550@reddit | LocalLLaMA | View on Reddit | 18 comments

[-]

isr_431@reddit

Bytedance hasnt had a bad history with open source. They created sdxl lightning which is used as a base model by many finetunes today

Dolphin 3.0 !

Posted by Evening_Action6217@reddit | LocalLLaMA | View on Reddit | 54 comments

[-]

isr_431@reddit

Dolphin Mixtral is still a beast

I don't get it.

Posted by AlgorithmicKing@reddit | LocalLLaMA | View on Reddit | 111 comments

[-]

isr_431@reddit

Just because its a MoE doesnt mean it will run identically to a 32b model. You still need to be able to fit the entire model in VRAM or RAM.

Looks like deepseekv3 API is up

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 24 comments

[-]

isr_431@reddit

Looks like it has 600b parameters, possible MoE? Deepseek v2.5 had \~230b parameters, so v3 is definitely larger

Has anyone tested phi4 yet? How does it perform?

Posted by LLMtwink@reddit | LocalLLaMA | View on Reddit | 26 comments

[-]

isr_431@reddit

Over longer conversations the dry/bland prose (similar to phi3) becomes pretty noticeable. For coding, it is definitely outclassed by Qwen2.5 14b and its coder variant. I haven't tested other use cases.

Just installed my first local LLM (Llama3.2)

Posted by garrincha-zg@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

isr_431@reddit

I personally don't see any issue with the output. However, don't expect Llama 3.2 to match the performance of closed models like GPT 4o as they are much larger. There are larger models like Qwen2.5 72b and Mistral Large which come close but require expensive hardware.

TIL Llama 3.3 can do multiple tool calls and tool composition in a single shot

Posted by zra184@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

isr_431@reddit

Anthropic's implementation is even better. It allows for subsequent tool calls after the first one. I wish all apis would adopt this

Microsoft Phi-4 GGUF available. Download link in the post

Posted by matteogeniaccio@reddit | LocalLLaMA | View on Reddit | 132 comments

[-]

isr_431@reddit

Perfect. Is it using chatml?

Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning

Posted by metalman123@reddit | LocalLLaMA | View on Reddit | 211 comments

[-]

isr_431@reddit

Looks like they is only a 14b model.

Open models wishlist

Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 238 comments

[-]

isr_431@reddit

I personally don't care for multimodality, and I'd rather have a smaller model that excels at text-based tasks. Also it takes ages to be implemented in llama.cpp (no judgement, just observation). I'm sure long context has been mentioned many times, 128k would be great. Also proper system prompt and tool calling support. Also less censorship. It would be unrealistic to expect a fully uncensored model but maybe reduce the amount of unnecessary refusals? Seeing how gemini flash 8b performs gives me high hopes for gemma 3! Thanks

Ollama has merged in K/V cache quantisation support, halving the memory used by the context

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 139 comments

[-]

isr_431@reddit

Thank you for your effort! I've been following the thread for a while, frustrating to see how long it took to be implemented.

Should I get a 14 inch M4 Max 128GB for 123B models?

Posted by TheLocalDrummer@reddit | LocalLLaMA | View on Reddit | 52 comments

[-]

isr_431@reddit

This is the drummer you're talking to, check his HF page to see the types of models he publishes. ChatGPT certainly isn't capable for that purpose

Since things are moving so quickly how do you stay up to date on best current tools and how to use them?

Posted by TryKey925@reddit | LocalLLaMA | View on Reddit | 48 comments

[-]

isr_431@reddit

Here are some organizations I follow: Mistral, Qwen, Cohere, 01, Meta, Google and Internlm. there's probably more that i haven't listed. You're right that benchmarks aren't the most reliable. I mainly check openllm leaderboard from time to time because of the sheer volume of models.

Since things are moving so quickly how do you stay up to date on best current tools and how to use them?

Posted by TryKey925@reddit | LocalLLaMA | View on Reddit | 48 comments

[-]

isr_431@reddit

I mostly track the release of new models by following them on huggingface and watching benchmarks

Most intelligent uncensored model under 48GB VRAM?

Posted by PMMEYOURSMIL3@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

isr_431@reddit

That model is based on Gemma 2 27b. There is also Tiger Gemma, based on 9b

Most intelligent uncensored model under 48GB VRAM?

Posted by PMMEYOURSMIL3@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

isr_431@reddit

Dolphin still requires a system prompt to most effectively uncensor it.

Most intelligent uncensored model under 48GB VRAM?

Posted by PMMEYOURSMIL3@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

isr_431@reddit

Big Tiger Gemma

Closed source model size speculation

Posted by redjojovic@reddit | LocalLLaMA | View on Reddit | 22 comments

[-]

isr_431@reddit

Please correct me if I'm wrong, but the 8b parameter count of Gemini Flash would be including the vision model. This would bring the 'true' parameter size to around 7b, which is very impressive for its performance.

Someone just created a pull request in llama.cpp for Qwen2VL support!

Posted by Many_SuchCases@reddit | LocalLLaMA | View on Reddit | 36 comments

[-]

isr_431@reddit

Just reminder, feel free to react to the post but don't comment something meaningless like '+1' because everyone subscribed to the thread will be constantly spammed.

[Missed Connections] Find Me Very Strange or Unique Models!

Posted by amanda_cat@reddit | LocalLLaMA | View on Reddit | 12 comments

[-]

isr_431@reddit

Check out: https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule. It uses the 'abliteration' technique, but instead of mitigating refusals it's used to make the model depressed/melancholic.

Mistral AI releases (API-only for now it seems) Mistral Large 3 and Pixtral Large

Posted by Vivid_Dot_6405@reddit | LocalLLaMA | View on Reddit | 99 comments

[-]

isr_431@reddit

iirc lmstudio has support for pixtral on macOS and exllamav2 also has support for it

Building a Mini PC for aya-expanse-8b Inference - Recommendations Needed!

Posted by Whiplashorus@reddit | LocalLLaMA | View on Reddit | 35 comments

[-]

isr_431@reddit

That's pretty interesting to hear since Mistral is a French company. It's good to hear that you've found a better option

Building a Mini PC for aya-expanse-8b Inference - Recommendations Needed!

Posted by Whiplashorus@reddit | LocalLLaMA | View on Reddit | 35 comments

[-]

isr_431@reddit

This is off topic, but have you tried Mistral Nemo/Ministral for English-to-French translation?

Qwen 2.5 7B Added to Livebench, Overtakes Mixtral 8x22B and Claude 3 Haiku

Posted by isr_431@reddit | LocalLLaMA | View on Reddit | 62 comments

[-]

isr_431@reddit (OP)

That's even more impressive if the number of parameters includes the vision model, bringing the true model size down to \~7b.

Qwen 2.5 7B Added to Livebench, Overtakes Mixtral 8x22B and Claude 3 Haiku

Posted by isr_431@reddit | LocalLLaMA | View on Reddit | 62 comments

[-]

isr_431@reddit (OP)

Qwen2.5 Coder 32b was also added. It is ranked third highest for coding only behind both versions of Claude 3.5 Sonnet

Your Experience with Small Language Models

Posted by numinouslymusing@reddit | LocalLLaMA | View on Reddit | 34 comments

[-]

isr_431@reddit

Qwen 2.5 Coder 1.5b/3b are perfect for code completion with Continue.

Why do we not have Loras like Civitai does for diffusion models?

Posted by FesseJerguson@reddit | LocalLLaMA | View on Reddit | 40 comments

[-]

isr_431@reddit

With SD models getting larger and harder to run (like Flux), a lot of people are using quants like GGUF. I wonder if SD's approach to loras will have to change

Best models under 8GB of VRAM?

Posted by HRudy94@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

isr_431@reddit

Second qwen 2.5 7b! Try out coder and math variants for specific tasks

Thoughts on Ministral 8B?

Posted by Amgadoz@reddit | LocalLLaMA | View on Reddit | 35 comments

[-]

isr_431@reddit

Sorry for the reply; was meaning to but totally forgot. Small models (\~7b) are pretty unreliable for tool calling. The best models in that range would Qwen 2.5 7B and Hermes 3 8B. You will get much better results if you use a larger model. Qwen 2.5 14B and Mistral Nemo are both much better for this purpose.

Thoughts on Ministral 8B?

Posted by Amgadoz@reddit | LocalLLaMA | View on Reddit | 35 comments

[-]

isr_431@reddit

Wizard Vicuna is an old model that is outperformed by many newer ones. The best uncensored model is undoubtedly Tiger Gemma v3 by TheDrummer, based on Gemma 2 9b.

OpenCoder: open and reproducible code LLM family which matches the performance of Top-Tier Code LLM

Posted by asb@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

isr_431@reddit

The Qwen team has already taken the new version down from their official HuggingFace page.

what's the cheapest hardware I can run Llama 3.2 11b (image inference) on?

Posted by dirtyring@reddit | LocalLLaMA | View on Reddit | 7 comments

[-]

isr_431@reddit

Ollama supports quantisized Llama 3.2 vision

LLM overkill is real: I analyzed 12 benchmarks to find the right-sized model for each use case 🤖

Posted by medi6@reddit | LocalLLaMA | View on Reddit | 80 comments

[-]

isr_431@reddit

Can you make it easier to select small models? Choosing low latency still returns Llama 3.1 70b among other options. I would recommend adding these coding models: Qwen 2.5 Coder 7b, Yi Coder 9b, CodeGeex4 All 9b.

So where’s Qwen2.5-Coder-32B?

Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 27 comments

[-]

isr_431@reddit

Response from Qwen member: [https://x.com/huybery/status/1853828761164321135](https://x.com/huybery/status/1853828761164321135)

Your best 3b model? Llama 3.2, kwen 2.5 or Phi 3.5?

Posted by noaibot@reddit | LocalLLaMA | View on Reddit | 12 comments

[-]

isr_431@reddit

Phi 3.5 for math and reasoning. Gemma 2 2b for creative writing. Llama 3 3b/Qwen 2.5 3b for instruction following/roleplay/chatting.

Is there anything that beats Mistral-Nemo 12b in coding that's still smaller than a Llama 3.1 70b quant?

Posted by ForsookComparison@reddit | LocalLLaMA | View on Reddit | 30 comments

[-]

isr_431@reddit

There are many smaller models which beat Nemo coding. Not sure what sources you're using, but are many coding benchmarks where you can find better models. I mainly use Qwen2.5 Cider 7b. You will also have results with Yi Coder 9b and Deepseek Coder Lite.

Meta releases an open version of Google's NotebookLM

Posted by isr_431@reddit | LocalLLaMA | View on Reddit | 135 comments

[-]

isr_431@reddit (OP)

True. My first impression with NotebookLM was how natural and coherent the voices were, with a surprising amount of emotion.

Cohere releases Aya Expanse multilingual AI model family

Posted by umarmnaq@reddit | LocalLLaMA | View on Reddit | 40 comments

[-]

isr_431@reddit

Has this not been out for a week already? I've been using it in Ollama since then. Gemma 2 9b is better at translation, but supports fewer languages. Qwen 2.5 is still the best for most Asian languages.

Best alternative to LM Studio?

Posted by PaytonAndHolyfield@reddit | LocalLLaMA | View on Reddit | 17 comments

[-]

isr_431@reddit

Open Webui, supports Ollama and OpenAI-compatible endpoints (including vLLM)

Best 3B model nowadays?

Posted by mr_house7@reddit | LocalLLaMA | View on Reddit | 38 comments

[-]

isr_431@reddit

Reasoning and logic: Phi 3.5 mini General purpose: Llama 3.2 3b Creative writing: Gemma 2 2b Speed: Qwen2.5 3b

Petition to auto-delete anything that mentions Matt Shumer, "Reflection", or any link to his Twitter or any affiliated Twitter accounts (Sahil, etc)

Posted by XMasterrrr@reddit | LocalLLaMA | View on Reddit | 46 comments

[-]

isr_431@reddit

This! I've already seen 3 psots today about him