Welcome to the club pal. Well, at least I've got 12GB of RAM. Which I'm using to get about 4ts/ps. It's certainly not Ava, but at least she's talking to me.
I have been using LiquidAI LFM2 and LFM2.5 models on Intel iGPU for sometime, until I got a discrete GPU, and it was pretty good for learning about inference and get it to solve some usecases. I still use it on the go on my phone :)
i have insane amounts of ram and only 16g vram, 2t/s on giant models is cool for like the first day and then as soon as you run Qwen3.6 35b a3b MoE MTP and you get like 40 tokens a second, on the same exact hardware, you realize ram only does so much, even setups where people are splitting across GPU takes a hit compared to a single GPU because of the bandwidth for them to talk to each other
But not being able to run them at all seems worse somehow even though they're almost unusable
I'm pretty happy about the recent releases too, especially the larger models, even though I "only" have 32GB GPUs.
My attitude is, if AI Winter falls tomorrow, whatever is available now might be all we get to have thereafter, at least until the open source community acquires the hardware to advance the technology ourselves.
The hardware will trickle down into our hands via the second-hand market, eventually. The difference between technology that costs as much as a luxury sedan and technology that costs as much as a burrito is about eight years.
With such powerful models as GLM-5.1, MiMo-V2.5-Pro, and MiniMax-M2.7 available now, even if all advances stop right here, we'll be in a really happy place for many years to come. As better hardware becomes available, these more powerful models will be ours to use on that hardware.
We would also be able to leverage these larger models to make better small models via distillation, so if some of us get more powerful hardware and the rest lag behind with 12GB or 24GB GPUs, as our datasets and distillation pipelines improve, so should the models which will fit in those smaller GPUs.
My future upgrade plan from my current MI50/MI60 homelab is: MI210 --> MI350P --> MI455X
If the future of hardware in the open source community is non-Nvidia, then perhaps support for other vendors' hardware in pytorch, unsloth, etc will improve.
I am confident that several labs around the world are working day and night to produce something that can do inference cheaply much more than what's available on the market there's just way too much cash in the market for people not to try to pull this off
craftogrammer@reddit
It's time for a VRAM downloader site, like we had RAM downloader. Things are changing so fast.
Fit-Celebration2884@reddit
64 GB page file lock in
Gipetto@reddit
I feel very fortunate to be a Mac user.
Processing img r6tymqg70z1h1...
No-Diet-8008@reddit
Welcome to the club pal. Well, at least I've got 12GB of RAM. Which I'm using to get about 4ts/ps. It's certainly not Ava, but at least she's talking to me.
mhb-11@reddit
My 4060 8GB VRAM seems to not do anything useful. I totally feel for your 128MB 🥹
daddywookie@reddit
I’ve got a 2GB card spare for you bro. Otherwise I’m trying to get my 8GB Intel card to perform and wishing I had a job to buy an upgrade.
mzrdisi@reddit
What kind of models can you run on a 2GB card and how do you optimize them to be productive?
daddywookie@reddit
I asked ChatGPT exactly that and it basically said “lol, no”.
--Spaci--@reddit
Think for yourself. You can run qwen 3.5 0.8b q4km just not well and you wont be very productive with it at anything
russjr08@reddit
I assume that's probably what ChatGPT told them. "Yes, its possible but..." hence the "lol, no" paraphrasing.
Kahvana@reddit
Qwen3.5-0.8B and 1B models like Granite 4.0 H 1GB
rainbyte@reddit
I have been using LiquidAI LFM2 and LFM2.5 models on Intel iGPU for sometime, until I got a discrete GPU, and it was pretty good for learning about inference and get it to solve some usecases. I still use it on the go on my phone :)
Excel_Document@reddit
qwen 3 2b at q4? like 1gb for the model and 1 gb for llamacpp and attention
mzrdisi@reddit
But is there utility? Or just novelty / testing?
Kahvana@reddit
Even Intel UHD 605 with 8GB system ram can run Qwen3.5-2B at Q4_K_S with 2t/s generation and 50 t/s processing.
Eyelbee@reddit
You can do cpu inference and get decent t/s on some models
Silver-Champion-4846@reddit
Only if you have enough system ram
CircularSeasoning@reddit
Enough is all you need.
.... more doesn't help.
Creative-Type9411@reddit
i have insane amounts of ram and only 16g vram, 2t/s on giant models is cool for like the first day and then as soon as you run Qwen3.6 35b a3b MoE MTP and you get like 40 tokens a second, on the same exact hardware, you realize ram only does so much, even setups where people are splitting across GPU takes a hit compared to a single GPU because of the bandwidth for them to talk to each other
But not being able to run them at all seems worse somehow even though they're almost unusable
CircularSeasoning@reddit
I have just enough RAM for Windows bloat + llama-server + Qwen3.6 35B A3B. My life is, in a sense, complete.
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
Puzll@reddit
One day 🙏
ttkciar@reddit
I'm pretty happy about the recent releases too, especially the larger models, even though I "only" have 32GB GPUs.
My attitude is, if AI Winter falls tomorrow, whatever is available now might be all we get to have thereafter, at least until the open source community acquires the hardware to advance the technology ourselves.
The hardware will trickle down into our hands via the second-hand market, eventually. The difference between technology that costs as much as a luxury sedan and technology that costs as much as a burrito is about eight years.
With such powerful models as GLM-5.1, MiMo-V2.5-Pro, and MiniMax-M2.7 available now, even if all advances stop right here, we'll be in a really happy place for many years to come. As better hardware becomes available, these more powerful models will be ours to use on that hardware.
We would also be able to leverage these larger models to make better small models via distillation, so if some of us get more powerful hardware and the rest lag behind with 12GB or 24GB GPUs, as our datasets and distillation pipelines improve, so should the models which will fit in those smaller GPUs.
JohnnyQuant@reddit
Nvidia has deals to buy off used datecenter GPUs so they can destroy them and keep the prices artificialy up. That is why second-hand market is weak.
Athabasco@reddit
A quick Google search doesn’t provide a source for this claim, but I wouldn’t be surprised if it’s true. Do you have a source?
--Spaci--@reddit
That is the most absurd thing I have ever heard nvidia do
8P8OoBz@reddit
As Chinese competition closes in this will kill their own market share.
ttkciar@reddit
I am very glad AMD does not do that.
My future upgrade plan from my current MI50/MI60 homelab is: MI210 --> MI350P --> MI455X
If the future of hardware in the open source community is non-Nvidia, then perhaps support for other vendors' hardware in pytorch, unsloth, etc will improve.
GoldenX86@reddit
Last time I tested it with an iris xe, vulkan works fine, just buy RAM.
Jatilq@reddit
Saw this a little while ago in one of the AI subs. Maybe worth looking into.
Local-first AI orchestration via Transformers.js & WebGPU. Express/Electron hybrid for low-end hardware. Vision, TTS, STT, and Music Generation.
https://github.com/LoanLemon/Omnix
akram200272002@reddit
I am confident that several labs around the world are working day and night to produce something that can do inference cheaply much more than what's available on the market there's just way too much cash in the market for people not to try to pull this off
cosmos_hu@reddit
Bro give some ram for it it's gonna be 8gb lol
SilverRegion9394@reddit (OP)
Wait fr????
floconildo@reddit
Processing img n0a2c6ytqy1h1...