I just try to run digital waifu, gguf file, image generation, TTS and trying talk llama fast. 4060ti can do all this, but not all of these at once. koboldai+silytavern for roleplay and stability matrix/comfyui for images generation with models from civitai. for video generation 16 gb vram is enough on framepack but don't have 64-128gb ddr4/5.
But it can’t even do fp4? the rtx 5000 series can do fp4. Maybe they’re like not even trying to sell us ai enthusiasts this card and are just targeting gamers/video editing etc.
I think the idea with having fp8 and fp4 support is that the gpu will have to do less calculations to go from fp16 to 4 bit for some layer. I’m real impressed by the dynamic quants like gptq that keep some layers at higher bits and then put other layers at lower bits like 4 since those layers affect the performance/accuracy less. Instead of quantizing a whole model to 4 bit we may have some layers at 4 bit, others at 8, others at 16, and so on and end up with real good performance for the amount of compute. I imagine fp4 support would mean better performance/less compute on the 4 bit layers, but I’m not too knowledgeable on the subject yet.
Correct me if I'm wrong currently Nvidia is the one controlling the market right? wouldn't be better for amd / Intel get a foot hold os more tools will works with their cards.
That would be a way to deliver massive value for customers, but the business goons have their hearts set on delivering massive value to shareholders by selling data center GPUs instead.
No need to compete when there's only two choices in the market and you can simply match your competitor rather than undercutting them on price aggressively.
But that IS competition.
This isn't a charity.
You're always compromising per-unit profit versus total profit in your pricing.
And you're always trying to get the best selling price you can.
Right now there's a flood of institutional, corporate and government money (which flows into institutional and corporate) buying away resources from we, the people.
That's a real problem that takes some learning to understand.
Yeah it is 33% more per GB based off MSRP pricing, but I am not sure how available the $2000 5090 FE is — realistically if you want a RTX 5090 today you are going to spend $3000+. Meanwhile, previous generations of RTX workstation cards are generally available at MSRP.
I checked nd it's available for 2k on best buy USA website. I found several others around 2200$ as well. So I think if you try you can get it fro MSRP.
And 8500 is still a speculated/leaked price AFAIK not MSRP.
I just checked the Best Buy website and there is a product listing for the Founder’s Edition at $2000, but it is “Sold Out” and apart from occasional stock drops have been that way since launch. If you search on Newegg for stock available to ship, it is all priced beyond $3000.
Electronics prices are a general shitshow than;s to Trump's tariffs. Like I said we'll see what'll be the price of RTX Pro 6000 once it's actually available to order.
the cheapest 5090 on newegg is 2500. 3 of them is 7500. That means there is an extra 1000 premium for the vram on an rtx pro 6000. Which is an extra $10/GB. So sorry for the egregious lie. I'm sorry the price of a fast food meal too big a lie for you to countenance.
No. That's the price of 96 fast food meals.
And 30% difference in price.
So quite the bullshit. You were wrong - own it, instead of shifting goalposts.
> Nvidia is releasing 96 gb cards to the consumer
enterprise and don't mistake it for goodwill, extra vram does not make it worth its 8k price tag memory modules doesn't cost 1k a piece like nvidia seems to try to tell us
It is not worth it to us consumers, but that’s not their target market. It is for companies who won’t blink at spending $30k a computer for their ML engineer. After all, what’s $30k if you are already paying the engineer half a million a year, especially if they are more efficient.
Not for enterprise users. "Pro" means it's a professional card for people who use it to make money, so even if it costs thousands (which it does), the card pays itself back in no time.
The last Radeon Pro card with 32GB VRAM (W7800) had an MSRP of $2,500.
"Us" referring to whom exactly? The only obvious thing here is that this is an expensive card aimed at the professional market, not the home/hobbyist user.
I'm sure there's plenty of enterprise/pro folks here who want to run models locally for the same reasons that home users do. Being able to better guarantee data privacy and security because you're not sending it over the internet (potentially to another country) to be processed on someone else's computer is very valuable in the professional space, not just for home users.
The most important for the target audience of this card is availability and the quality of support, not the price.
There’s an nvidia verified gamer/creator program now for getting to buy an nvidia 5080/5090 on the nvidia marketplace at msrp. If they think I would pay $500 more for a card with the same specs and no CUDA then they some dumb dumbs. Maybe the exception here would be if someone was wanting to buy multiple for making a multi gpu rig, but even then I imagine CUDA with some 4090’s or 3090’s would be better. I suppose there’s the possibility that they’re going to surprise us with some CUDA like new software that justifies the msrp, but I doubt it.
Given the lack of CUDA, what is the most yall would pay for this gpu? Comment below
Enterprise cares about all the certifications and support you don't get with consumer cards. Nvidia is still selling 32 and 24 GB Pro cards even though the 5090 exists.
The RDNA 4-based card with 32GB is likely to be a successor or comparable to the W7800, given the similar memory capacity and professional focus. The W7800’s $2,499 price sets a baseline.
Desktop/server DDR can do this because they have chipselect pins so they can support multiple ranks per channel. GDDR don't have them, so all they can do is clamshell rather than increasing ranks. 32GB per 256bit GDDR6 is already using the highest available capacity GDDR chip and combining them with clamshell so there's no further chance of doubling the capacity
Someone figured it out...
[https://www.reddit.com/r/LocalLLaMA/comments/1j6i1ma/comment/mgp30xg/](https://www.reddit.com/r/LocalLLaMA/comments/1j6i1ma/comment/mgp30xg/)
Still, why do they limit themselves?
Is AMD, not some random very small business with a hand full of people that take some "old" 24gb GPUs and turn them into 48gb...
Yet those very small businesses manage to do it and AMD don't.
Some are even sold for about $3000
Blame the system. See, high demand = high cost. That means high cost for us and high cost for the manufacturer. Memory chip is used everywhere and the particular one used on GPUs are very special kind.
It's also not about why they can't but they decide to do it for business reasons (gotta milk the consumers to make as much profit as it can)
It's because AMD execs all have Nvidia stocks. so if they release a product that is too good they will personally lose money. They're gimping themselves on purpose.
They limit themselves to the smaller memory bus for cost / yield reasons, memory controllers are more sensitive to defects + they don’t scale as well with smaller nodes. AMD 100% could make a 512 bit version of the 9070 XT die LOL but that would cost a LOT of money per chip (in addition to the fixed cost of the tape out, which is usually in the tens of millions of dollars)
The 24 GB to 48 GB conversion is possible probably bc whatever GPU that was has a bigger memory bus.
AMD makes the 48GB W7800 with a $2500 MSRP.
Partners used to be able to put more VRAM in GPUs in the past, but they are forbidden now by AMD and Nvidia, and I guess Intel too. The reason is to not canibalize that professional market where they charge absurd premiums for the extra VRAM.
W7800
They will run what ? ROCm ? LOL. The only way to make them usable is to sell them for 380/400$ MAX, that is gonna be good card for LLM but not with ROCm but Vulkan.
I have an RX 7900XTX and I'm running ROCm on Windows 11 and LM Studio. It's speed is 92% of Vulkan but with better DDR5 memory management. I have no complains. What am I missing?
Linux ROCm here. Almost every image generation or video generation is compatible with CUDA not ROCm or have problem with ROCm due to shitty code.
For LLM text generation on linux, vulkan do not require anything, no LTS version of Ubuntu or what so ever. ROCm require LTS version, it's a problem on linux.
Vulkan work without installing anything. Vulkan is faster than ROCM. Vulkan is non LTS locked. Vulkan is supported on 99% of Linux distribution.
Fedora is not in the official compatible list of distro, one update > goodbye your working distro :)
[https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-distributions](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-distributions)
L-O-L.
even if that was true, performance is still shit :
[https://github.com/ollama/ollama/pull/5059#issuecomment-2816882002](https://github.com/ollama/ollama/pull/5059#issuecomment-2816882002)
CUDA or Vulkan, other stuff is currently shit. I love my AMD GPU, but for AI... Amd really need to wake up.
From some posts on llama.cpp, flash attention is only available on GPUs with **coopmat2** extension. It has nothing to Vulkan AFAIK.
On other GPUs, if you enable flash attention, it swaps data to RAM and uses the CPU which makes the performance go down as there is constant swapping from RAM to VRAM.
> ROCm require LTS version, it's a problem on linux.
So do many CUDA[-based] libraries, and yet they do run fine on my Kubuntu 24.10.
I agree that Vulkan seems to be a better solution than ROCm -- at the moment.
As a side note, I'm yet to see a hardware company, any HW company, that is good at software.
UI always looks like it was designed by their marketing alone... Thankfully, we no longer have NVIDIA-styled green bitmapped buttons that stuck like sore thumbs, but it still leaves a lot to be desired.
NVIDIA superiority complex.
Right now NVIDIA **is** superior in software support, by far, CUDA enjoying default status, ROCm is an addon. But I have a feeling this will change, and then it will be good to already have looked into alternatives.
W7900 was 48GB. RDNA doesn't have GDRR7 chips. Yes, architecture is better,but it's not that good. If those cards have HBM3e, then it's another story. Because I don't really care about cuda
32GB for workstation class GPU when NV is delivering up to 96GB on Blackwell Pro is fairly weak. I'd hope to see 48/64/96GB cards to be competitive.
48GB Blackwell is ~$4600. In theory the 5090 32gb is $1999 (admittedly, good luck on that). Pricing has to make sense in that context along with some discount to make up for the software stack and variant on actual availability on cards moving forward. They could try for $1999-$2499 if they actually deliver and if 5090s remain elusive maybe, but even that is a bit of a stretch.
If they offered some sort of NVLink-like interface between cards that could add value since NVLink disappeared from everything outside datacenter class.
A bit underwhelmed. AMD could really capture market by offering better $/GB even if all other specs are a bit behind. GDDR6 already means bandwidth is likely going to be a bit lame unless they've got some space magic, like a huge SRAM cache and prayers the software can utilize it effectively.
107 Comments
gfy_expert@reddit
SmellsLikeAPig@reddit
gfy_expert@reddit
b0tbuilder@reddit
SmellsLikeAPig@reddit
gfy_expert@reddit
CarefulGarage3902@reddit
SmellsLikeAPig@reddit
CarefulGarage3902@reddit
ResponsibleTruck4717@reddit
EugenePopcorn@reddit
ResponsibleTruck4717@reddit
EugenePopcorn@reddit
crantob@reddit
Bandit-level-200@reddit
Medium_Chemist_4032@reddit
grady_vuckovic@reddit
crantob@reddit
emprahsFury@reddit
KontoOficjalneMR@reddit
frankchn@reddit
KontoOficjalneMR@reddit
frankchn@reddit
KontoOficjalneMR@reddit
frankchn@reddit
KontoOficjalneMR@reddit
frankchn@reddit
Hunting-Succcubus@reddit
avinash240@reddit
Hunting-Succcubus@reddit
emprahsFury@reddit
KontoOficjalneMR@reddit
thrownawaymane@reddit
emprahsFury@reddit
kb4000@reddit
Bandit-level-200@reddit
emprahsFury@reddit
frankchn@reddit
My_Unbiased_Opinion@reddit
HugoCortell@reddit
Xyzzymoon@reddit
custodiam99@reddit
FastDecode1@reddit
BusRevolutionary9893@reddit
FastDecode1@reddit
HugoCortell@reddit
CarefulGarage3902@reddit
Ninja_Weedle@reddit
nostriluu@reddit
bblankuser@reddit
Such_Advantage_6949@reddit
PorchettaM@reddit
b3081a@reddit
BusRevolutionary9893@reddit
Such_Advantage_6949@reddit
resnet152@reddit
custodiam99@reddit
Rustybot@reddit
gfy_expert@reddit
512bitinstruction@reddit
Healthy-Nebula-3603@reddit
b3081a@reddit
Conscious_Cut_6144@reddit
b3081a@reddit
Conscious_Cut_6144@reddit
b3081a@reddit
Healthy-Nebula-3603@reddit
Hunting-Succcubus@reddit
Alphasite@reddit
KontoOficjalneMR@reddit
Healthy-Nebula-3603@reddit
AmazinglyObliviouse@reddit
relmny@reddit
eding42@reddit
relmny@reddit
Txt8aker@reddit
Allseeing_Argos@reddit
eding42@reddit
asssuber@reddit
Nexter92@reddit
custodiam99@reddit
Nexter92@reddit
MikeLPU@reddit
Nexter92@reddit
rusty_fans@reddit
Nexter92@reddit
InsideYork@reddit
MikeLPU@reddit
AppearanceHeavy6724@reddit
giant3@reddit
AppearanceHeavy6724@reddit
giant3@reddit
Nexter92@reddit
AppearanceHeavy6724@reddit
Nexter92@reddit
AppearanceHeavy6724@reddit
plankalkul-z1@reddit
custodiam99@reddit
WolpertingerRumo@reddit
custodiam99@reddit
DrBearJ3w@reddit
HistorianPotential48@reddit
Ok_Top9254@reddit
Freonr2@reddit
mindwip@reddit
Sicarius_The_First@reddit
beedunc@reddit