Pyros-SD-Models

M5 vs DGX Spark vs Strix Halo vs RTX 6000

Posted by Signal_Ad657@reddit | LocalLLaMA | View on Reddit | 261 comments

[-]

Gift to myself : tiny lab

Posted by Final-Data-1410@reddit | LocalLLaMA | View on Reddit | 86 comments

[-]

Pyros-SD-Models@reddit

you sure? if you prompt the bot "write as if you would have a stroke" will produce similar text.

"Hardware is the only moat" - Should we buy new hardware now or wait?

Posted by Alan_Silva_TI@reddit | LocalLLaMA | View on Reddit | 177 comments

[-]

>Open models might not reach the level of utility where you feel like you absolutely must own local hardware. How do people keep saying this after getting from llama3 to qwen3.6 in not even two years.

Google is making local AI available to mainstream users ;)

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 161 comments

[-]

Pyros-SD-Models@reddit

This sub is also anti-AI, according to this sub we hit capability limits every few months, but this time the wall is real, and 2024 was "lol scammers" when anyone was talking about how AI can soon do prober dev work, and AI won't ever be able to do this and what not

[Release] AugmentedQuill 0.9.0: Open-source AI story-writing GUI

Posted by StableLlama@reddit | LocalLLaMA | View on Reddit | 22 comments

[-]

Pyros-SD-Models@reddit

and Silas... in a library...

Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better!

Posted by LocalAI_Amateur@reddit | LocalLLaMA | View on Reddit | 95 comments

[-]

Pyros-SD-Models@reddit

It looks like the one game every LLM on earth somehow wants to implement if you ask it for a small puzzle game: laser-refractor-puzzles :D but yes, dense qwen best qwen

Confirmed: SWE Bench is now a benchmaxxed benchmark

Posted by rm-rf-rm@reddit | LocalLLaMA | View on Reddit | 105 comments

[-]

Pyros-SD-Models@reddit

If in a decontaminated benchmark like SWE-ReBench my 6-month-old medium model is on par with Opus 4.6, but in SWE-Bench the same Opus leads by 15%, then yes, that looks like pretty comical benchmaxxing by Anthropic. And a good opportunity to say something imho https://swe-rebench.com/

I can’t believe I can say “ugh I don’t feel like fixing this function, it’s too complex” and I can literally just tell my computer to fix it for me. I didn’t understand what they meant by “people will start paying for intelligence” but now I do.

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 140 comments

[-]

Pyros-SD-Models@reddit

I’m with Terrence Tao on this. Perhaps the ability to predict n+1 as accurately as possible based on n₀ to n is exactly intelligence. So yeah being able to guess the next word, why shouldn't it be intelligence? https://www.reddit.com/r/accelerate/comments/1qo4he1/terence_tao_says_the_era_of_ai_is_proving_that/

What do you consider to be the minimum performance (t/s) for local Agent workflows?

Posted by MexInAbu@reddit | LocalLLaMA | View on Reddit | 61 comments

[-]

Pyros-SD-Models@reddit

lol this skill is amazing. And genius.

This is where we are right now, LocalLLaMA

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 491 comments

[-]

Pyros-SD-Models@reddit

I also think people are speaking from belief rather than actual experience, because they haven’t really tried Qwen3.6-27B. For coding agent tasks, Qwen3.6-27B inside Pi mops the floor with Sonnet inside Claude Code. Or they’re judging adjacent tasks, but yeah, obviously Qwen3.6-27B will not meticulously search half the internet and write the most perfect plan ever. It can do it, but it doesn’t extract the learnings as well as something like Opus or GPT-Pro would. But nobody is talking about that, since OP is clearly referring to coding tasks, not planning tasks.

What do you want me to try?

Posted by amitbahree@reddit | LocalLLaMA | View on Reddit | 71 comments

[-]

Pyros-SD-Models@reddit

Anime Boobas with SD 1.5

Deepseek has released DeepEP V2 and TileKernels.

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 51 comments

[-]

Pyros-SD-Models@reddit

You should look up how many papers OpenAI is realeasing in a year and you will end up with a number higher than deepseek papers but OpenAI bad obviously. Needs to be mentioned even in non-OpenAI related threads lol.

Bonsai models are pure hype: Bonsai-8B is MUCH dumber than Gemma-4-E2B

Posted by WeGoToMars7@reddit | LocalLLaMA | View on Reddit | 71 comments

[-]

Pyros-SD-Models@reddit

It actually lives up to the hype, but you need to understand that to reach 1-bit, something has to die. Prism’s whole claim is that if you sacrifice basically all factual knowledge of a model, you can still preserve its reasoning abilities, and yes, it does. Which is amazing, because it suggests that reasoning does not depend heavily on factual knowledge. But this also means that asking the model random factual questions is not a good evaluation and does not say anything about the core claim. It only validates that they did, in fact, nuke factuality.

Bonsai models are pure hype: Bonsai-8B is MUCH dumber than Gemma-4-E2B

Posted by WeGoToMars7@reddit | LocalLLaMA | View on Reddit | 71 comments

[-]

Pyros-SD-Models@reddit

I mean, it is also not fair to judge models based on three random questions (google “sample size”), and your questions do not disprove their benchmark scores either. The benchmarks are open, you can literally run the exact same evaluations yourself. I find it quite amusing that people here love to shit-talk benchmarks, and then you look at how this sub actually tests models and it’s just some dude asking three questions. No methodology, no ablations, no correlation study, no de-biasing. Nothing. Not even a definition of what is actually being measured. What does “dumb” even mean in your context? Are humans who don’t know bun dumber than people who do? How does “knowing bun” correlate with intelligence? Do they even correlate? Congrats, you just invented the worst benchmark possible. And if you think this somehow disproves the work Prism did... you know, actual scientific work: building a theory, running experiments, measuring results (with quite a bit more than just three questions), and even providing the exact tooling in their repo so you can reproduce the experiments, which hundreds of people already did — then I have bad news for you. And the benchmarks themselves make this pretty clear: there is not a single claim that Bonsai preserves factual grounding. The benchmark suite includes MMLU-R, MuSR, GSM8K, HumanEval+, IFEval, and BFCL. That covers reasoning, math, coding, instruction following, and function calling. Their core metric is “intelligence density per gigabyte.” The entire thesis is that 1-bit quantization preserves reasoning capability at a fraction of the size... not that it preserves encyclopedic knowledge. Those are different things. It's literally the point of their work to sacrifice factual grounding for reasoning stability, and you just tested that they did in fact sacrifice factual grounding. Amazing.

Qwen3.6. This is it.

Posted by Local-Cardiologist-5@reddit | LocalLLaMA | View on Reddit | 420 comments

[-]

Pyros-SD-Models@reddit

Literally every model being discussed here "stole" shit to train on, so I find it somewhat amusing that people are all up in arms about ollama basically using open source as it is designed. you can argue about morality, but it's a very simple question: are they violating any licenses they are supposed to adhere to? no? end of story. llama.cpp chose its license with full awareness of what people would do with the software and the code, and if they would like people to behave a certain way they should have written it into their fcking license

Qwen3.6-35B-A3B released!

Posted by ResearchCrafty1804@reddit | LocalLLaMA | View on Reddit | 721 comments

[-]

Pyros-SD-Models@reddit

Wait, wasn't the top thread yesterday how the golden age of LLMs is now over

We have a new weight class...

Posted by LegacyRemaster@reddit | LocalLLaMA | View on Reddit | 123 comments

[-]

Pyros-SD-Models@reddit

Of course there are ways, especially in a commercial context. Most of those ways are called "employees who want to shit on their employer." I got like 5k bucks from Embarcadero just for ratting out my employer for using cracked Delphi versions. easiest money in my life.

We have a new weight class...

Posted by LegacyRemaster@reddit | LocalLLaMA | View on Reddit | 123 comments

[-]

Pyros-SD-Models@reddit

Just looked at the big open weight releases of the last 12 months and literally no lab is calling what they do “open source”. This sub was calling open weight “open source”during llama2 already so if something hijacked anything than it is this very sub

Could it be that this take is not too far fetched?

Posted by pier4r@reddit | LocalLLaMA | View on Reddit | 115 comments

[-]

Pyros-SD-Models@reddit

Perception drift is a well-understood and well-researched phenomenon. The fact that a single car wash trick question and the “perceived performance” opinions of some redditors are not proof of anything seems to be less well understood.

OpenCode concerns (not truely local)

Posted by Ueberlord@reddit | LocalLLaMA | View on Reddit | 185 comments

[-]

Pyros-SD-Models@reddit

Where does the idea it being a local too come from anyway? Like their homepage mentions “local” only once in “supports local models”.

Qwen3.5-35B-A3B is a gamechanger for agentic coding.

Posted by jslominski@reddit | LocalLLaMA | View on Reddit | 410 comments

[-]

Pyros-SD-Models@reddit

It's a setting in opencode

GLM-5 Officially Released

Posted by ResearchCrafty1804@reddit | LocalLLaMA | View on Reddit | 161 comments

[-]

Pyros-SD-Models@reddit

Good thing about this “run locally” play is that once it finally finishes processing the prompt I gave it, GLM-6 will already be released 😎

GLM-5 Officially Released

Posted by ResearchCrafty1804@reddit | LocalLLaMA | View on Reddit | 161 comments

[-]

Pyros-SD-Models@reddit

Buying their yearly MAX back when it was 350$ was one of the better decisions I did in my life. Already paid for itself a couple of times over. https://preview.redd.it/b315tmg1kwig1.png?width=1252&format=png&auto=webp&s=73fd58f0cd8c854d656fba0cf078f5ee3744a3f3

GLM-5 Officially Released

Posted by ResearchCrafty1804@reddit | LocalLLaMA | View on Reddit | 161 comments

[-]

Pyros-SD-Models@reddit

>For GLM Coding Plan subscribers: Due to limited compute capacity, we’re rolling out GLM-5 to Coding Plan users gradually. >Other plan tiers: Support will be added progressively as the rollout expands. chillax you get your GLM-5.0

Hugging Face Is Teasing Something Anthropic Related

Posted by Few_Painter_5588@reddit | LocalLLaMA | View on Reddit | 234 comments

[-]

Pyros-SD-Models@reddit

Our client base is around 1k companies with over 1mil end users. Chinese models or any other open weight model in use = 0. I swear this sub is living in a parallel universe or something.

Am I the only one who feels that, with all the AI boom, everyone is basically doing the same thing?

Posted by Empty_Enthusiasm_167@reddit | LocalLLaMA | View on Reddit | 198 comments

[-]

Pyros-SD-Models@reddit

If you think google “copied” yahoo you are either 12 and therefore never used old yahoo or you have no clue of how their searches differ and the math behind it. Just because both make cars it doesn’t mean Ferrari is copying Ford. Die

Is Local Coding even worth setting up

Posted by Interesting-Fish6494@reddit | LocalLLaMA | View on Reddit | 103 comments

[-]

Pyros-SD-Models@reddit

Your employer should obviously pay your tools, like they pay your visual studio or office365 or jetbrains suite already. I’m sure even the slowest suit at your place is able to do the math of how many hours Claude Max has to save for it to be worth and that this is literally a no-brainer investment.

zai-org/GLM-4.7-Flash · Hugging Face

Posted by Dark_Fire_12@reddit | LocalLLaMA | View on Reddit | 242 comments

[-]

Pyros-SD-Models@reddit

If the 60% swe bench really feels like the 60% swe bench you know from other LLMs in that category in real world task than this is not a competition anymore. It’s domination.

It works! Abliteration can reduce slop without training

Posted by -p-e-w-@reddit | LocalLLaMA | View on Reddit | 139 comments

[-]

Pyros-SD-Models@reddit

lol

Jensen Huang at CES on how open models have really revolutionized AI last year. “When AI is open, it proliferates everywhere.”

Posted by Nunki08@reddit | LocalLLaMA | View on Reddit | 86 comments

[-]

Pyros-SD-Models@reddit

I swear this sub has the economic understanding of a broken LLaMA-2 quant. You know these “5M” or whatever numbers they like to throw around this week do not happen in a vacuum, right? Sure, you can probably train an o1-level model for $50k today. That does not mean the preceding research was free. And it obviously does not mean you can compare those $50k with whatever OpenAI paid in R and D, unless you have the economic understanding of a broken LLaMA-2 quant. I sometimes wonder what this sub is even using their local models for. It certainly is not to help themselves understand basic concepts. Let your bot explain to you what "marginal costs" are and "total costs" perhaps you learn a thing or two today.

Career Advice in AI — Notes from an Andrew Ng Lecture

Posted by Dear-Success-1441@reddit | LocalLLaMA | View on Reddit | 56 comments

[-]

Pyros-SD-Models@reddit

If you don't manage to create working oauth with bleeding-edge SOTA LLMs than this is a you problem anyway and a career in IT is the last thing you need to worry about. Go do some gardening instead or something. Andrew Ng is not talking to you.

Qwen released Qwen-Image-Edit-2511 — a major upgrade over 2509

Posted by Difficult-Cap-7527@reddit | LocalLLaMA | View on Reddit | 34 comments

[-]

Pyros-SD-Models@reddit

Imagine being so fixated on some random silicon valley twink, that you have to name him unprompted in a release thread of an amazing image model.

I outperformed BERT-Base on SNLI (96.19%) using a 52MB model trained entirely on my MacBook CPU. No Transformers, just Physics.

Posted by chetanxpatil@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

Pyros-SD-Models@reddit

bro, i literally wasted 10minutes of my life to actually look at your code... "Vector Collapse Engine" -> 2-layer MLP + 3 learned prototype vectors "Basin Field" -> Online k-means clustering ""Quantum-Inspired Divergence (0.38)" wtf -> Cosine similarity with a hardcoded margin "Dynamic Basin Spawning" -> Adding new cluster centroids when points are far from existing ones "Geometric Energy" -> Mean squared error loss "Zero Embeddings" -> ...literally uses nn.Embedding, i.e., standard word vectors. The core architecture is a Skip-gram (Word2Vec, 2013) combined with Prototypical Networks (2017) and some online clustering logic from the 90s. They're not novel, not "physics-based," and definitely not quantum-anything. You can't just take things and rename them as you please. also a license containing stuff like this should tell you everything: >"If you are employed by a corporation, reviewing this code may contaminate your internal IP" I don't know about my 'internal IP' but reviewing that code made me certainly dumber.

Trained a chess LLM locally that beats GPT-5 (technically)

Posted by KingGongzilla@reddit | LocalLLaMA | View on Reddit | 62 comments

[-]

Pyros-SD-Models@reddit

> it wouldn't know what legal moves it can make past the opening. There are literally dozens of papers showing that LLMs reverse-engineer game rules just from the moves they get fed and build internal world models from that. This paper shows that if OP had trained a bigger model, he would have observed a very interesting effect: the LLM plays better chess than the games it was trained on. https://arxiv.org/pdf/2406.11741v1 It just gets ignored because it's hard proof that LLMs are more than pure statistics and that they do real learning.

New model, microsoft/VibeVoice-Realtime-0.5B

Posted by edward-dev@reddit | LocalLLaMA | View on Reddit | 70 comments

[-]

Pyros-SD-Models@reddit

>How can you incorporate it into a product if you don't understand its limitations. You don't? It's a research model, not a deployment-ready production model. They literally state that they don't recommend it for production use cases.

I outperformed BERT-Base on SNLI (96.19%) using a 52MB model trained entirely on my MacBook CPU. No Transformers, just Physics.

Posted by chetanxpatil@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

Pyros-SD-Models@reddit

Not to be a party pooper, but "your" idea already has a name: contrastive learning. You will also soon discover its limits, such as how easily it can overfit on the test set.

Ministral WebGPU: Run Mistral's new multimodal models 100% locally in your browser.

Posted by xenovatech@reddit | LocalLLaMA | View on Reddit | 15 comments

[-]

Pyros-SD-Models@reddit

3 years ago some guy in this sub told me I have no clue about machine learning and that I don’t work in AI because I said that we have realistic full HD video generation done before 2030 because according to them “reality is too complex and would need a completely different form of architecture” and people working in the field would understand this simple truth. This was especially funny because we already had a quite advanced video model prototype at this time and people in the field actually know that vision is way easier to scale than text since one modality is perception ally bounded and the other isn’t but what do I know.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

Pyros-SD-Models@reddit

>LLMs cannot introspect into their hidden layers just like people cannot accurately introspect about their brain's processes. By the way, in case someone reads this five months later: They obviously cannot access the numerical values of their weights or other low-level technical properties. But they can introspect and show awareness of what they were trained on. https://arxiv.org/abs/2501.11120 and https://transformer-circuits.pub/2025/introspection/index.html and a few more.

Coursera Founder And AI Pioneer Andrew Ng Just Dropped An AI Reviewer That Performs At Human Level

Posted by AskGpts@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

Pyros-SD-Models@reddit

Medicine peer reviews are typically under 0.3: https://pubmed.ncbi.nlm.nih.gov/27533881/ Science in general usually sits under 0.4: https://arxiv.org/pdf/1404.0359 Closing in on 0.5 is amazing. Srsly, I've never seen a community that loves trash talking their own field of interest or hobby as much as this sub. Machine Learning is actually one of the most accurate fields of science in terms of papers that survive peer review or get disproven.

Coursera Founder And AI Pioneer Andrew Ng Just Dropped An AI Reviewer That Performs At Human Level

Posted by AskGpts@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

Pyros-SD-Models@reddit

ITT: people who don't even know what correlation means confidently debating frontier AI. Watching someone look at a 0.4-0.5 correlation and go "lol coin toss" or "lol so bad" is wild. That's not what correlation measures, and if you think it is, you've already disqualified yourself from the conversation. Human-human reviewer correlation at ICLR/NeurIPS has been in the 0.2-0.4 range for decades. That is the benchmark. So an AI hitting human-level variability isn't "random," it's literally matching the messiness of the real process. If you genuinely believe a moderate linear relationship = "50/50 randomness," you shouldn't be weighing in on how frontier AI systems work. You should go back to high school because you clearly missed a huge portion of math there. And to give you an intuition for what 0.4 looks like: human-human movie ratings usually correlate around 0.3-0.4 as well. That doesn't make movies bad, the raters bad, or the process noisy. It just means humans disagree a bit while still being in general agreement. Medicine peer reviews are at <0.3 https://pubmed.ncbi.nlm.nih.gov/27533881/ Science in general is usually <0.4 https://arxiv.org/pdf/1404.0359 So if anything, ML is more concrete in correlation

Coursera Founder And AI Pioneer Andrew Ng Just Dropped An AI Reviewer That Performs At Human Level

Posted by AskGpts@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

Pyros-SD-Models@reddit

Dude, no. Correlation is not “accuracy” and definitely not a “coin toss.” That’s just statistical illiteracy dressed up as a hot take. A coin toss baseline is zero correlation. Literally no relationship between two sets of scores.

Bro and I thought I was an overthinker! vibeTHINKER on LM studio with no instructions.

Posted by Sufficient-Brain-371@reddit | LocalLLaMA | View on Reddit | 86 comments

[-]

Pyros-SD-Models@reddit

Feel free to point to those similar 1.5B non thinking models scoring the same as this model on AIME and other math sets.

Anthropic pushing again for regulation of open source models?

Posted by MasterDragon_@reddit | LocalLLaMA | View on Reddit | 262 comments

[-]

Pyros-SD-Models@reddit

Since their services are available to EU citizens they of course have to follow EU law, especially since you can host their models via AWS on EU ground. Also downloading and using it to train their models isn’t what the court ruled was illegal. The fact that they stored the books is what got them into trouble. The court explicitly explained in their verdict that downloading books and using them as ephemeral input to training is fair use (see google books case) and if Anthropic would have deleted the books right away after ingestion there wouldn’t be any case. So no “pirating data” is not obviously illegal and depends on what you are doing with the data.

Reflection AI reached human-level performance (85%) on ARC-AGI v1 for under $10k and within 12 hours. You can run this code yourself, it’s open source.

Posted by balianone@reddit | LocalLLaMA | View on Reddit | 36 comments

[-]

Pyros-SD-Models@reddit

>"all wrapper applications are bad!" people just say this, because the alternative means, if a model performs bad at a task it's my fault I orchestrated it wrongly and not the model's fault, and of course it's always the model's fault and not my shitty prompts or orchestration.

Minimax now offers Coding Plans, but is it worth it?

Posted by baykarmehmet@reddit | LocalLLaMA | View on Reddit | 29 comments

[-]

Pyros-SD-Models@reddit

with the sub (at least with GLM/z.AI) you get of course access to every future model they release, or at least so they promise on twitter.

Kimi K2 Thinking was trained with only $4.6 million

Posted by InternationalAsk1490@reddit | LocalLLaMA | View on Reddit | 157 comments

[-]

Pyros-SD-Models@reddit

And if Google didn't put in millions into deepmind, and OpenAi didn't proof by pouring some money into training GPT-2 that you can scale transformers into actual intelligence than we would still discuss the newest RNN architecture that can generate a sensical sentence every hundred generations and still pray to LeCun to finally finish up his symbolic AI paper that will lead us to the promised AGI land. Real vacuum only exists in theory.

OpenAI Pushes to Label Datacenters as ‘American Manufacturing’ Seeking Federal Subsidies After Preaching Independence

Posted by Ok-Breakfast-4676@reddit | LocalLLaMA | View on Reddit | 107 comments

[-]

Pyros-SD-Models@reddit

Since American taxpayers voted Trump into office, and this whole document is basically everything Trump wants to hear to “make America great again,” the American taxpayer should be ecstatic. Sprinkle in some Fox News hit pieces about how a Chinese offline model will steal your data, and the only solution is to prop up American AI. Easy peasy. This thread is so funny. The average American taxpayer has problems reading and writing English text, and this thread acts as if they’re somehow a huge wall of opposition. I’ll tell you a secret: not a single one of the people this document was written for gives a single shit about the American taxpayer. What’s he even going to do? Get his Cheeto ass off the couch and fight for his political voice? Haha, good joke.

World's strongest agentic model is now open source

Posted by Charuru@reddit | LocalLLaMA | View on Reddit | 281 comments

[-]

Pyros-SD-Models@reddit

> this leaderboard means literally nothing It literally means exactly what it says it means, that Kimi is currently leading the T2 Telecom bench. Neither AA nor the creators of the benchmark are at fault when the smooth brains of this sub interpret more into it than that.

Kimi 2 is the #1 creative writing AI right now. better than sonnet 4.5

Posted by Excellent-Run7265@reddit | LocalLLaMA | View on Reddit | 149 comments

[-]