mivog49274

i dedicate this meme to you r/LocalLLaMA

Posted by LPFchan@reddit | LocalLLaMA | View on Reddit | 44 comments

Breaking the music supply constraint

Posted by entsnack@reddit | LocalLLaMA | View on Reddit | 317 comments

mivog49274@reddit

yeah this guy is just way too much in the future which makes him wrong. he just speedran all of the flaws and ills produced by super capable automated systems, a global atomised system of entertainment in a dystopian society... what's goofy though is imagining someone being satisfied with the current state of (local !) generative AI on music and presenting himself as a music lover (but not a sweat lover obviously) The setup is neat though ! But the speakers... would rather be called sloppers here ;)

Next year we're getting 0.5T model from Grok

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 200 comments

meituan-longcat/LongCat-Video-Avatar-1.5 路 Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 15 comments

DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 119 comments

mivog49274@reddit

Oddly enough some days ago I would send screenshots to V4 Pro, but the feature of uploading file for Pro was restricted to Flash. And yeah it perfectly read what was in the picture

Qwen 3.7 droped on Qwen Chat

Posted by Foxiya@reddit | LocalLLaMA | View on Reddit | 221 comments

Qwen 3.7 droped on Qwen Chat

Posted by Foxiya@reddit | LocalLLaMA | View on Reddit | 221 comments

Qwen 3.7 droped on Qwen Chat

Posted by Foxiya@reddit | LocalLLaMA | View on Reddit | 221 comments

Will there be any more Qwen3.6 series models?

Posted by cafedude@reddit | LocalLLaMA | View on Reddit | 102 comments

mivog49274@reddit

Qwen will continue to deliver. Frontier ain't their hood. If they leave the local/mid-sized model space, they will just disappear. Deepseek V4 "preview" has now vision capabilities and is blowing their 3.6-Plus and 3.6-Max out of the water. Qwen team is known to be composed of kings of optimization and power-per-parameter ratio. Either they break through the frontier but I think there are a few chances, either they continue on the bet on the decline of the cost of intelligence, as said by Kilpatrick, and keep their strategic edge on the field.

New "major breakthrough?" architecture SubQ

Posted by Daemontatox@reddit | LocalLLaMA | View on Reddit | 37 comments

Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_100 路 Hugging Face

Posted by FaustAg@reddit | LocalLLaMA | View on Reddit | 15 comments

mivog49274@reddit

It's been discussed here a few days ago : https://reddit.com/r/LocalLLaMA/comments/1szrbub/qwenscope_official_sparse_autoencoders_saes_for/ It's a kind of MRI for Neural Networks allowing to associate clusters of neurons with intelligible "concepts" for us. Anthropic were the first I saw mentioning those kind of tools (without never sharing them ?...) There is a rabbit hole starting point posted on the sub post I referred previously : https://www.neuronpedia.org/

Decreased Intelligence Density in DeepSeek V4 Pro

Posted by Mindless_Pain1860@reddit | LocalLLaMA | View on Reddit | 90 comments

mivog49274@reddit

this one will shatter mountains and make rivers of sweat and tears flow, as his older sibling 3.1 did. I really trust the incremental thrust power of Deepseek, in addition to the fact that this model seem to be a "preview". With the expected price drop, this will certainly be something.

Qwen 3.6 Max Preview just went live on the Qwen Chat website. It currently has the highest AA-Intelligence Index score among Chinese models (52) (Will it be open source?)

Posted by Nunki08@reddit | LocalLLaMA | View on Reddit | 92 comments

mivog49274@reddit

Max is a well known series model in their ecosystem, since qwen 2.5 ? it's the scaled up version of their generation training loop, I think it's about 1.5T in terms of size, never released, always beaten in benchmarks by their next generation of open models. I don't know why "incremental" updates (.3,.4...) in all models releasing recently (GLM, GPT, Claude ect) delivers much powerful ones than in previous months, there seem to be a general acceleration since the end of 2025. Really hope to have a Bonsai Qwen3.6-397B-A17B or Qwen3.6-122B-A10B, the 3.6 update was indeed quite a jump !

LLM Neuroanatomy III - LLMs seem to think in geometry, not language

Posted by Reddactor@reddit | LocalLLaMA | View on Reddit | 100 comments

mivog49274@reddit

Man, come on. Don't fell on the brainless automated "ai slop" comments; it's becoming a meta signal. "Ai slop" signaling is becoming slop in itself. We need this parallel/peripheral community-driven research existing, and you are becoming an important actor of it -- don't stop !

[Model Release] I trained a 9B model to be agentic Data Analyst (Qwen3.5-9B + LoRA). Base model failed 100%, this LoRA completes 89% of workflows without human intervention.

Posted by Awkward_Run_9982@reddit | LocalLLaMA | View on Reddit | 45 comments

mivog49274@reddit

is banana bread concrete or a concrete demand would include concrete inside the banana bread recipe ? nano bananabread-8B is killer though (hit me if you want huggingface link) good job btw smh

One year later: this question feels a lot less crazy

Posted by gamblingapocalypse@reddit | LocalLLaMA | View on Reddit | 48 comments

mivog49274@reddit

Check out SimpleBench, Fiction.liveBench and eqbench.com different results distances with o3, in order to have a less narrow viewpoint for model performance comparison. We should actually aggregate all the possible benchmarks for the two in order to have the slightest idea of such a comparison.

We aren鈥檛 even close to AGI

Posted by CrimsonShikabane@reddit | LocalLLaMA | View on Reddit | 314 comments

mivog49274@reddit

AGI = A threshold of capabilities = Adaptability. I get that "Capabilities" can be vague but it can be clearly step-by-step stated empirically (it's done every time here for any llm "measured" and tested (real world cases, formatting, function calling, making summaries, checking tasks states, ect). The billion question still lies where is it possible to reach this level of capabilities (world model, next token prediction, multi-modality, scale, hardware ect; what's mandatory required to reach it), where Sam Altman clearly took the bet of llms. I personally think an hybrid transformer/neuro symbolic is the key. A fully text-token AGI would be extraordinary more easy to audit and control, as well as cheaper to run. I really hope we will be able to reach a in-computer, text-token AGI. A capable system like this would be able to know what it doesn't, and thus, try to play Elden Ring after a few tentatives before giving up and providing reasons why : my agent harness is stupidly non optimized, I'm just a text token navigator, ect.

Gemma

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 32 comments

Gemma

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 32 comments

mivog49274@reddit

I hope a massive amount of people on 饾晱 will begin to say Kilpatrick Kilpatrick Kilpatrick To make him feel how annoying this way of communication is becoming

Analyzing Claude Code Source Code. Write "WTF" and Anthropic knows.

Posted by QuantumSeeds@reddit | LocalLLaMA | View on Reddit | 170 comments

mivog49274@reddit

Reading me makes me laugh since I got frenziedly downvoted here by zealots (of what ? I don't really know) for saying that claude code was listening and sending data here... https://old.reddit.com/r/LocalLLaMA/comments/1r5nnhz/glm5_is_officially_on_nvidia_nim_and_you_can_now/ ...

Talking with the people that spam their AI slop is actually really fun!

Posted by EffectiveCeilingFan@reddit | LocalLLaMA | View on Reddit | 42 comments

Talking with the people that spam their AI slop is actually really fun!

Posted by EffectiveCeilingFan@reddit | LocalLLaMA | View on Reddit | 42 comments

mivog49274@reddit

Dude -- this sound fantastic, really 馃 -- but -- we still did not -- see the seahorse emoji you mentioned a few times. Remember -- no seahorse emoji, no Gemini 5 api key ! ;)

OpenCode concerns (not truely local)

Posted by Ueberlord@reddit | LocalLLaMA | View on Reddit | 185 comments

OpenCode concerns (not truely local)

Posted by Ueberlord@reddit | LocalLLaMA | View on Reddit | 185 comments

mivog49274@reddit

Thank you so much for the explanation, it feels so clear right now ! But I still didn't get why you mentioned an api key starting with -molt ? Can you re-print the api key in use so we can debug it together ?

What is Hunter Alpha?

Posted by MrMrsPotts@reddit | LocalLLaMA | View on Reddit | 144 comments

New benchmark just dropped.

Posted by ConfidentDinner6648@reddit | LocalLLaMA | View on Reddit | 140 comments

mivog49274@reddit

It would have been interesting to see each model's thinking process, library handling, search, ect. Very good job for this idea of benchmark !

More quantization visualization types (repost)

Posted by copingmechanism@reddit | LocalLLaMA | View on Reddit | 51 comments

More quantization visualization types (repost)

Posted by copingmechanism@reddit | LocalLLaMA | View on Reddit | 51 comments

GLM-5 is officially on NVIDIA NIM, and you can now use it to power Claude Code for FREE 馃殌

Posted by PreparationAny8816@reddit | LocalLLaMA | View on Reddit | 40 comments

mivog49274@reddit

to whoever downvoted me, just check claude code repo page on github please... >Data collection, usage, and retention > >When you use Claude Code, we collect feedback, which includes usage data (such as code acceptance or rejections), associated conversation data, and user feedback submitted via the /bug command.

GLM-5 is officially on NVIDIA NIM, and you can now use it to power Claude Code for FREE 馃殌

Posted by PreparationAny8816@reddit | LocalLLaMA | View on Reddit | 40 comments

mivog49274@reddit

Claude code clearly states collecting data, that the thing that is pulling it off from me. Did not see any mention on how to turn this off.

Nemotron-3-nano:30b is a spectacular general purpose local LLM

Posted by DrewGrgich@reddit | LocalLLaMA | View on Reddit | 133 comments

ISRM: Infinitely Scalable Recursive Model

Posted by Available-Craft-5795@reddit | LocalLLaMA | View on Reddit | 12 comments

GLM 4.7 is out on HF!

Posted by KvAk_AKPlaysYT@reddit | LocalLLaMA | View on Reddit | 131 comments

Is Grokipedia available for fine-tuning?

Posted by Chance-Studio-8242@reddit | LocalLLaMA | View on Reddit | 30 comments

Celebrating 1 year anniversary of the revolutionary game changing LLM that was Reflection 70b

Posted by LosEagle@reddit | LocalLLaMA | View on Reddit | 20 comments

mivog49274@reddit

like, was it the first ever chain-of-thoughts llm release ever ? Like wtf what's the story behind why him and why releasing at that moment ? Even if the perfs numbers where made up it matched pretty well the performance bump on benchmarks made by "real" CoT llms lol

What you think it will be..

Posted by Independent-Wind4462@reddit | LocalLLaMA | View on Reddit | 139 comments

deepseek-ai/DeepSeek-V3.1-Base 路 Hugging Face

Posted by xLionel775@reddit | LocalLLaMA | View on Reddit | 196 comments

deepseek-ai/DeepSeek-V3.1-Base 路 Hugging Face

Posted by xLionel775@reddit | LocalLLaMA | View on Reddit | 196 comments

mivog49274@reddit

> https://deepseek.ai/blog/deepseek-v31, 25th of march 2025. One day after V3-0324. It's either a new model, or the base model for 0324. But the blog post from march mentions a 1M context window so yeah I'm kind of confused right now. Maybe it's another "small but big" update.

deepseek-ai/DeepSeek-V3.1-Base 路 Hugging Face

Posted by xLionel775@reddit | LocalLLaMA | View on Reddit | 196 comments

mivog49274@reddit

I think the blog writers may got messed up and propagated the name of "3.1" for V3-0325 - this matches the date of release on hf, 2025-03-24 for the hf release and 2025-03-25 for the blog post. https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 It's either a new model, or the base model for 0324. But the blog post from march mentions a 1M context window so yeah I'm kind of confused right now. Maybe it's another "small but big" update.

GPT-OSS Benchmarks: How GPT-OSS-120B Performs in Real Tasks

Posted by facethef@reddit | LocalLLaMA | View on Reddit | 79 comments

We鈥檙e definitely keeping him up at night right now.

Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 35 comments

OpenAI's open-weight model will debut as soon as next week

Posted by phantasm_ai@reddit | LocalLLaMA | View on Reddit | 115 comments

Google releases MagentaRT for real time music generation

Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 81 comments

mivog49274@reddit

Sounds nice ! thanks for the share Gemma team ! Any plan to embed a "intelligent" unit inside the system knowing formal standards of music theory, like instead of producing auto-regressively predicted tokens, before generating, a grid on which notes or rhythms are being written or played would be chosen ? or curating such data would be just nightmarish at the moment because it would involve knowing each note played and each instrument chosen for each sample of the training set ?

INTELLECT-2 finished training today

Posted by kmouratidis@reddit | LocalLLaMA | View on Reddit | 21 comments

Qwen3-30B-A3B solves the o1-preview Cipher problem!

Posted by sunpazed@reddit | LocalLLaMA | View on Reddit | 18 comments

What's interesting is that Qwen's release is three months behind Deepseek's. So, if you believe Qwen 3 is currently the leader in open source, I don't think that will last, as R2 is on the verge of release. You can see the gap between Qwen 3 and the three-month-old Deepseek R1.

Posted by Select_Dream634@reddit | LocalLLaMA | View on Reddit | 53 comments

mivog49274@reddit

Is Q-235B-A22B *really* better than R1 ? I mean in real usage cases. Qwen delivers for sure but I'm always skeptical about those benchmark numbers. If that's the case it's just huge that we have o1 at home, moreover in a MoE, runnable on a shitty 16Gb RAM laptop (no offense to laptop owners).

Open-Weights Model next week?

Posted by MustBeSomethingThere@reddit | LocalLLaMA | View on Reddit | 75 comments

How do LLMs actually do this?

Posted by No-Conference-8133@reddit | LocalLLaMA | View on Reddit | 270 comments

mivog49274@reddit

this is really smart ! thank you for this demonstration. I thought also about prompting "try again" in order to "avoid" the "look closer" direction. I thought llms could process pictures as "pure tokens" and thus "see", in the sense of interpreting the [pixel] information into the latent space. This demonstrates this isn't the case. Maybe it's the difference between multimodal models (4o and gemini impressive demos) and simple vision encoders.

NoLiMa: Long-Context Evaluation Beyond Literal Matching - Finally a good benchmark that shows just how bad LLM performance is at long context. Massive drop at just 32k context for all models.

Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 111 comments

o3-mini is now the SOTA coding model. It is truly something to behold. Procedural clouds in one-shot.

Posted by LocoMod@reddit | LocalLLaMA | View on Reddit | 228 comments

mivog49274@reddit

there would never be enough nodal/visual programming tools in the wild. I'm eager to test this one day, feel free to dm if you ever need a beta tester ;)