isr_431

mistralai/Mistral-Small-24B-Base-2501 · Hugging Face

Posted by Dark_Fire_12@reddit | LocalLLaMA | View on Reddit | 87 comments

Mark Zuckerberg on Llama 4 Training Progress!

Posted by ybdave@reddit | LocalLLaMA | View on Reddit | 90 comments

DeepSeek R1 takes second place on the multi-player benchmark for cooperation, negotiation, and deception.

Posted by zero0_one1@reddit | LocalLLaMA | View on Reddit | 41 comments

Just canceled my OpenAI Plus subscription (for now). Been running DeepSeek-R1 14b locally on my home workstation. I'll probably renew it if OpenAI launches something worthy for Plus tier by then.

Posted by CarbonTail@reddit | LocalLLaMA | View on Reddit | 165 comments

Major changes are coming this year. Buckle up.

Posted by estebansaa@reddit | LocalLLaMA | View on Reddit | 134 comments

What is your method to find good NSFW models? preferably for role playing

Posted by Ok_Appointment2593@reddit | LocalLLaMA | View on Reddit | 188 comments

isr_431@reddit

I tested the model ages ago and I've forgotten the results. However, I would recommend using Lyra Gutenberg or Violet Twilight instead.

UGI-Leaderboard Remake! New Political, Coding, and Intelligence benchmarks

Posted by DontPlanToEnd@reddit | LocalLLaMA | View on Reddit | 15 comments

isr_431@reddit

Thank you for all the hard work you've put into this. I've been following it since the beginning and requested way too many models to be added. Can you bring back the ability to view models within a certain parameter size range, but using a slider instead of checkboxes (used in the previous iteration)? Also, why do a lot of a proprietary models have a higher UGI score than before? I swear that any Anthropic model had a rock bottom score. Or maybe it's just me hallucinating 🤣

Phi-4 has been released

Posted by paf1138@reddit | LocalLLaMA | View on Reddit | 229 comments

Phi-4 in insanely good at rephrasing the last message for multi-turn rag questions

Posted by LinkSea8324@reddit | LocalLLaMA | View on Reddit | 38 comments

Xiaomi recruits key DeepSeek researcher to lead its AI lab.

Posted by sb5550@reddit | LocalLLaMA | View on Reddit | 18 comments

isr_431@reddit

Bytedance hasnt had a bad history with open source. They created sdxl lightning which is used as a base model by many finetunes today

Dolphin 3.0 !

Posted by Evening_Action6217@reddit | LocalLLaMA | View on Reddit | 54 comments

I don't get it.

Posted by AlgorithmicKing@reddit | LocalLLaMA | View on Reddit | 111 comments

isr_431@reddit

Just because its a MoE doesnt mean it will run identically to a 32b model. You still need to be able to fit the entire model in VRAM or RAM.

Looks like deepseekv3 API is up

Posted by shing3232@reddit | LocalLLaMA | View on Reddit | 24 comments

Has anyone tested phi4 yet? How does it perform?

Posted by LLMtwink@reddit | LocalLLaMA | View on Reddit | 26 comments

isr_431@reddit

Over longer conversations the dry/bland prose (similar to phi3) becomes pretty noticeable. For coding, it is definitely outclassed by Qwen2.5 14b and its coder variant. I haven't tested other use cases.

Just installed my first local LLM (Llama3.2)

Posted by garrincha-zg@reddit | LocalLLaMA | View on Reddit | 13 comments

isr_431@reddit

I personally don't see any issue with the output. However, don't expect Llama 3.2 to match the performance of closed models like GPT 4o as they are much larger. There are larger models like Qwen2.5 72b and Mistral Large which come close but require expensive hardware.

TIL Llama 3.3 can do multiple tool calls and tool composition in a single shot

Posted by zra184@reddit | LocalLLaMA | View on Reddit | 21 comments

isr_431@reddit

Anthropic's implementation is even better. It allows for subsequent tool calls after the first one. I wish all apis would adopt this

Microsoft Phi-4 GGUF available. Download link in the post

Posted by matteogeniaccio@reddit | LocalLLaMA | View on Reddit | 132 comments

Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning

Posted by metalman123@reddit | LocalLLaMA | View on Reddit | 211 comments

Open models wishlist

Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 238 comments

isr_431@reddit

I personally don't care for multimodality, and I'd rather have a smaller model that excels at text-based tasks. Also it takes ages to be implemented in llama.cpp (no judgement, just observation). I'm sure long context has been mentioned many times, 128k would be great. Also proper system prompt and tool calling support. Also less censorship. It would be unrealistic to expect a fully uncensored model but maybe reduce the amount of unnecessary refusals? Seeing how gemini flash 8b performs gives me high hopes for gemma 3! Thanks

Ollama has merged in K/V cache quantisation support, halving the memory used by the context

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 139 comments

Should I get a 14 inch M4 Max 128GB for 123B models?

Posted by TheLocalDrummer@reddit | LocalLLaMA | View on Reddit | 52 comments

isr_431@reddit

This is the drummer you're talking to, check his HF page to see the types of models he publishes. ChatGPT certainly isn't capable for that purpose

Since things are moving so quickly how do you stay up to date on best current tools and how to use them?

Posted by TryKey925@reddit | LocalLLaMA | View on Reddit | 48 comments

isr_431@reddit

Here are some organizations I follow: Mistral, Qwen, Cohere, 01, Meta, Google and Internlm. there's probably more that i haven't listed. You're right that benchmarks aren't the most reliable. I mainly check openllm leaderboard from time to time because of the sheer volume of models.

Since things are moving so quickly how do you stay up to date on best current tools and how to use them?

Posted by TryKey925@reddit | LocalLLaMA | View on Reddit | 48 comments

Most intelligent uncensored model under 48GB VRAM?

Posted by PMMEYOURSMIL3@reddit | LocalLLaMA | View on Reddit | 73 comments

Most intelligent uncensored model under 48GB VRAM?

Posted by PMMEYOURSMIL3@reddit | LocalLLaMA | View on Reddit | 73 comments

Most intelligent uncensored model under 48GB VRAM?

Posted by PMMEYOURSMIL3@reddit | LocalLLaMA | View on Reddit | 73 comments

Closed source model size speculation

Posted by redjojovic@reddit | LocalLLaMA | View on Reddit | 22 comments

isr_431@reddit

Please correct me if I'm wrong, but the 8b parameter count of Gemini Flash would be including the vision model. This would bring the 'true' parameter size to around 7b, which is very impressive for its performance.

Someone just created a pull request in llama.cpp for Qwen2VL support!

Posted by Many_SuchCases@reddit | LocalLLaMA | View on Reddit | 36 comments

isr_431@reddit

Just reminder, feel free to react to the post but don't comment something meaningless like '+1' because everyone subscribed to the thread will be constantly spammed.

[Missed Connections] Find Me Very Strange or Unique Models!

Posted by amanda_cat@reddit | LocalLLaMA | View on Reddit | 12 comments

isr_431@reddit

Check out: https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule. It uses the 'abliteration' technique, but instead of mitigating refusals it's used to make the model depressed/melancholic.

Mistral AI releases (API-only for now it seems) Mistral Large 3 and Pixtral Large

Posted by Vivid_Dot_6405@reddit | LocalLLaMA | View on Reddit | 99 comments

Building a Mini PC for aya-expanse-8b Inference - Recommendations Needed!

Posted by Whiplashorus@reddit | LocalLLaMA | View on Reddit | 35 comments

Building a Mini PC for aya-expanse-8b Inference - Recommendations Needed!

Posted by Whiplashorus@reddit | LocalLLaMA | View on Reddit | 35 comments

Qwen 2.5 7B Added to Livebench, Overtakes Mixtral 8x22B and Claude 3 Haiku

Posted by isr_431@reddit | LocalLLaMA | View on Reddit | 62 comments

isr_431@reddit (OP)

That's even more impressive if the number of parameters includes the vision model, bringing the true model size down to \~7b.

Qwen 2.5 7B Added to Livebench, Overtakes Mixtral 8x22B and Claude 3 Haiku

Posted by isr_431@reddit | LocalLLaMA | View on Reddit | 62 comments

Your Experience with Small Language Models

Posted by numinouslymusing@reddit | LocalLLaMA | View on Reddit | 34 comments

Why do we not have Loras like Civitai does for diffusion models?

Posted by FesseJerguson@reddit | LocalLLaMA | View on Reddit | 40 comments

isr_431@reddit

With SD models getting larger and harder to run (like Flux), a lot of people are using quants like GGUF. I wonder if SD's approach to loras will have to change

Best models under 8GB of VRAM?

Posted by HRudy94@reddit | LocalLLaMA | View on Reddit | 21 comments

Thoughts on Ministral 8B?

Posted by Amgadoz@reddit | LocalLLaMA | View on Reddit | 35 comments

isr_431@reddit

Sorry for the reply; was meaning to but totally forgot. Small models (\~7b) are pretty unreliable for tool calling. The best models in that range would Qwen 2.5 7B and Hermes 3 8B. You will get much better results if you use a larger model. Qwen 2.5 14B and Mistral Nemo are both much better for this purpose.

Thoughts on Ministral 8B?

Posted by Amgadoz@reddit | LocalLLaMA | View on Reddit | 35 comments

isr_431@reddit

Wizard Vicuna is an old model that is outperformed by many newer ones. The best uncensored model is undoubtedly Tiger Gemma v3 by TheDrummer, based on Gemma 2 9b.

OpenCoder: open and reproducible code LLM family which matches the performance of Top-Tier Code LLM

Posted by asb@reddit | LocalLLaMA | View on Reddit | 21 comments

what's the cheapest hardware I can run Llama 3.2 11b (image inference) on?

Posted by dirtyring@reddit | LocalLLaMA | View on Reddit | 7 comments

LLM overkill is real: I analyzed 12 benchmarks to find the right-sized model for each use case 🤖

Posted by medi6@reddit | LocalLLaMA | View on Reddit | 80 comments

isr_431@reddit

Can you make it easier to select small models? Choosing low latency still returns Llama 3.1 70b among other options. I would recommend adding these coding models: Qwen 2.5 Coder 7b, Yi Coder 9b, CodeGeex4 All 9b.

So where’s Qwen2.5-Coder-32B?

Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 27 comments

isr_431@reddit

Response from Qwen member: [https://x.com/huybery/status/1853828761164321135](https://x.com/huybery/status/1853828761164321135)

Your best 3b model? Llama 3.2, kwen 2.5 or Phi 3.5?

Posted by noaibot@reddit | LocalLLaMA | View on Reddit | 12 comments

isr_431@reddit

Phi 3.5 for math and reasoning. Gemma 2 2b for creative writing. Llama 3 3b/Qwen 2.5 3b for instruction following/roleplay/chatting.

Is there anything that beats Mistral-Nemo 12b in coding that's still smaller than a Llama 3.1 70b quant?

Posted by ForsookComparison@reddit | LocalLLaMA | View on Reddit | 30 comments

isr_431@reddit

There are many smaller models which beat Nemo coding. Not sure what sources you're using, but are many coding benchmarks where you can find better models. I mainly use Qwen2.5 Cider 7b. You will also have results with Yi Coder 9b and Deepseek Coder Lite.

Meta releases an open version of Google's NotebookLM

Posted by isr_431@reddit | LocalLLaMA | View on Reddit | 135 comments

isr_431@reddit (OP)

True. My first impression with NotebookLM was how natural and coherent the voices were, with a surprising amount of emotion.

Cohere releases Aya Expanse multilingual AI model family

Posted by umarmnaq@reddit | LocalLLaMA | View on Reddit | 40 comments

isr_431@reddit

Has this not been out for a week already? I've been using it in Ollama since then. Gemma 2 9b is better at translation, but supports fewer languages. Qwen 2.5 is still the best for most Asian languages.

Best alternative to LM Studio?

Posted by PaytonAndHolyfield@reddit | LocalLLaMA | View on Reddit | 17 comments

Best 3B model nowadays?

Posted by mr_house7@reddit | LocalLLaMA | View on Reddit | 38 comments

Petition to auto-delete anything that mentions Matt Shumer, "Reflection", or any link to his Twitter or any affiliated Twitter accounts (Sahil, etc)

Posted by XMasterrrr@reddit | LocalLLaMA | View on Reddit | 46 comments