Still happy for yall

[-]

craftogrammer@reddit

It's time for a VRAM downloader site, like we had RAM downloader. Things are changing so fast.

[-]

Gipetto@reddit

I feel very fortunate to be a Mac user.

Processing img r6tymqg70z1h1...

[-]

No-Diet-8008@reddit

Welcome to the club pal. Well, at least I've got 12GB of RAM. Which I'm using to get about 4ts/ps. It's certainly not Ava, but at least she's talking to me.

[-]

mhb-11@reddit

My 4060 8GB VRAM seems to not do anything useful. I totally feel for your 128MB 🥹

[-]

daddywookie@reddit

I’ve got a 2GB card spare for you bro. Otherwise I’m trying to get my 8GB Intel card to perform and wishing I had a job to buy an upgrade.

[-]

mzrdisi@reddit

What kind of models can you run on a 2GB card and how do you optimize them to be productive?

[-]

daddywookie@reddit

I asked ChatGPT exactly that and it basically said “lol, no”.

[-]

--Spaci--@reddit

Think for yourself. You can run qwen 3.5 0.8b q4km just not well and you wont be very productive with it at anything

[-]

russjr08@reddit

I assume that's probably what ChatGPT told them. "Yes, its possible but..." hence the "lol, no" paraphrasing.

[-]

Kahvana@reddit

Qwen3.5-0.8B and 1B models like Granite 4.0 H 1GB

[-]

I have been using LiquidAI LFM2 and LFM2.5 models on Intel iGPU for sometime, until I got a discrete GPU, and it was pretty good for learning about inference and get it to solve some usecases. I still use it on the go on my phone :)

[-]

Excel_Document@reddit

qwen 3 2b at q4? like 1gb for the model and 1 gb for llamacpp and attention

[-]

mzrdisi@reddit

But is there utility? Or just novelty / testing?

[-]

Kahvana@reddit

Even Intel UHD 605 with 8GB system ram can run Qwen3.5-2B at Q4_K_S with 2t/s generation and 50 t/s processing.

[-]

Eyelbee@reddit

You can do cpu inference and get decent t/s on some models

[-]

Silver-Champion-4846@reddit

Only if you have enough system ram

[-]

CircularSeasoning@reddit

Enough is all you need.

.... more doesn't help.

[-]

Creative-Type9411@reddit

i have insane amounts of ram and only 16g vram, 2t/s on giant models is cool for like the first day and then as soon as you run Qwen3.6 35b a3b MoE MTP and you get like 40 tokens a second, on the same exact hardware, you realize ram only does so much, even setups where people are splitting across GPU takes a hit compared to a single GPU because of the bandwidth for them to talk to each other

But not being able to run them at all seems worse somehow even though they're almost unusable

[-]

CircularSeasoning@reddit

I have just enough RAM for Windows bloat + llama-server + Qwen3.6 35B A3B. My life is, in a sense, complete.

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

Puzll@reddit

One day 🙏

[-]

ttkciar@reddit

I'm pretty happy about the recent releases too, especially the larger models, even though I "only" have 32GB GPUs.

My attitude is, if AI Winter falls tomorrow, whatever is available now might be all we get to have thereafter, at least until the open source community acquires the hardware to advance the technology ourselves.

The hardware will trickle down into our hands via the second-hand market, eventually. The difference between technology that costs as much as a luxury sedan and technology that costs as much as a burrito is about eight years.

With such powerful models as GLM-5.1, MiMo-V2.5-Pro, and MiniMax-M2.7 available now, even if all advances stop right here, we'll be in a really happy place for many years to come. As better hardware becomes available, these more powerful models will be ours to use on that hardware.

We would also be able to leverage these larger models to make better small models via distillation, so if some of us get more powerful hardware and the rest lag behind with 12GB or 24GB GPUs, as our datasets and distillation pipelines improve, so should the models which will fit in those smaller GPUs.

[-]

JohnnyQuant@reddit

Nvidia has deals to buy off used datecenter GPUs so they can destroy them and keep the prices artificialy up. That is why second-hand market is weak.

[-]

Athabasco@reddit

A quick Google search doesn’t provide a source for this claim, but I wouldn’t be surprised if it’s true. Do you have a source?

[-]

--Spaci--@reddit

That is the most absurd thing I have ever heard nvidia do

[-]

8P8OoBz@reddit

As Chinese competition closes in this will kill their own market share.

[-]

ttkciar@reddit

I am very glad AMD does not do that.

My future upgrade plan from my current MI50/MI60 homelab is: MI210 --> MI350P --> MI455X

If the future of hardware in the open source community is non-Nvidia, then perhaps support for other vendors' hardware in pytorch, unsloth, etc will improve.

[-]

GoldenX86@reddit

Last time I tested it with an iris xe, vulkan works fine, just buy RAM.

[-]

Jatilq@reddit

Saw this a little while ago in one of the AI subs. Maybe worth looking into.

Local-first AI orchestration via Transformers.js & WebGPU. Express/Electron hybrid for low-end hardware. Vision, TTS, STT, and Music Generation.

https://github.com/LoanLemon/Omnix

[-]

akram200272002@reddit

I am confident that several labs around the world are working day and night to produce something that can do inference cheaply much more than what's available on the market there's just way too much cash in the market for people not to try to pull this off

[-]

cosmos_hu@reddit

Bro give some ram for it it's gonna be 8gb lol

[-]

SilverRegion9394@reddit (OP)

Wait fr????

[-]

floconildo@reddit

Processing img n0a2c6ytqy1h1...

craftogrammer@reddit

Fit-Celebration2884@reddit

Gipetto@reddit

No-Diet-8008@reddit

mhb-11@reddit

daddywookie@reddit

mzrdisi@reddit

daddywookie@reddit

--Spaci--@reddit

russjr08@reddit

Kahvana@reddit

rainbyte@reddit

Excel_Document@reddit

mzrdisi@reddit

Kahvana@reddit

Eyelbee@reddit

Silver-Champion-4846@reddit

CircularSeasoning@reddit

Creative-Type9411@reddit

CircularSeasoning@reddit

WithoutReason1729@reddit

Puzll@reddit

ttkciar@reddit

JohnnyQuant@reddit

Athabasco@reddit

--Spaci--@reddit

8P8OoBz@reddit

ttkciar@reddit

GoldenX86@reddit

Jatilq@reddit

akram200272002@reddit

cosmos_hu@reddit

SilverRegion9394@reddit (OP)

floconildo@reddit