Newbie Question about GPU choice
Posted by mundane_marietta@reddit | LocalLLaMA | View on Reddit | 13 comments
Use case - training a model on 10 years of my writing, high school football player data, scouting reports, historical stats, etc., so that I can create a model that churns out 25 articles a day (between 250-750 words) for my football recruiting website.
I have good deals in place for a 5070 for $475 and a 4080 for $715 tax included. I just need to decide which one would be the best value for my use case. My local Microcenter does have a few 3090's available for $775.
I have no idea what I'm doing, so the upfront investment does seem daunting as the prices climb, but the season is almost over, and I believe with time, I can figure out what to do.
Not sure if this is the appropriate place to ask this question, and I know VRAM is king, but not sure if a 5070 could do the trick for my use case.
BumbleSlob@reddit
Arguably you should just be using something like https://kiln.tech/ KilnAI and just offloading fine tuning to the cloud for your use case.
mundane_marietta@reddit (OP)
Why do you say that?
And if so, I guess the 5070 would be the more prudent option
Evening_Ad6637@reddit
Getting your own GPU to train models is primarily for people who do it because they genuinely love the process itself: it's their hobby to deep dive into the entire Finetuning topic, spending huge amounts of time learning, tinkering, and doing lots of trial and error; and not feeling frustrated by it, but actually enjoying it. And let me tell you, this can become a pretty expensive hobby.
For this kind of tinkering, even very small models are perfectly fine to start with, and you can just use 'foreign', publicly available datasets.
But however your primary goal isn't the finetuning process itself. You want to use your own personal experience as a dataset to ultimately have the model write correct and accurate articles exactly the way you want it to.
To do that, you need two things: first, a reasonably smart model with enough parameters, and second, you need adequately powerful hardware. You might even need to do a 'proper' full finetuning and not just a LoRA. An RTX 5070 isn't going to cut it here.
Your money would be far better and more effectively spent on cloud computing. You can rent hardware there by the hour or day that is one or two orders of magnitude more expensive (and powerful) than those consumer cards you mentioned. Plus, you don't have to deal with all the setup and environment hassle. This will save you money, time, and sanity.
Keep in mind that you'll likely already be spending a ton (and I mean a TON) of time creating and cleaning your dataset anyway.
Oh, and if you want to get a quick feel for how much this kind of cloud computing costs, I can recommend https://runpod.io from my own experience.
mundane_marietta@reddit (OP)
Thanks for the response. This is a lot to take in.
Just to clarify, I'm not looking for a hobby, but I do see potential value in having local AI that can recreate my content, pulling historical stats, using my scouting reports, or just provide quick information, all in a central location. I often waste a lot of time researching information on top of writing. I have written a couple of thousand articles, but aside from that, most of the data I have stored in xlsx documents.
So you believe cloud computing is the best bang for my buck when it comes to training? What about after I'm done training? Could a 5070 not handle the inference locally, or would this model be too big? What about data and privacy? Stuff like my scouting reports is not publicly available.
In an ideal world, I'd like to package all of this together and sell to college programs, but for now, creating a central hub for Georgia high school football is my goal for 2026. I don't mind losing my sanity in the process if it can reduce my workload and increase productivity for the foreseeable future.
Evening_Ad6637@reddit
Yes, exactly. I think for training the model, it's much smarter and more effective to invest your money in rented computing power.
After training, you can download your newly trained model from the rented server and run inference completely locally and offline. However, I don't think the RTX 5070 is ideal for this situation either; the RTX 3090 offers you the best price-to-performance ratio: more VRAM and significantly higher memory bandwidth = faster inference.
Regarding privacy and data protection, I think you're on the pretty safe side with an SSH connection to the rented server. But you could also go a step further, set up a VPN connection, and only allow SSH within the private network. Personally, though, I generally prefer not to use US companies when working with very critical or sensitive data, and instead rent something from the German provider 'Hetzner' for such workloads.
By the way, if you're still wondering which model would even be an option, I would consider something like llama-3.1-8B as the absolute bare minimum. Ideally, though, I would rather train something like Mistral-Small-24B. That model is really smart, reliable, and just a workhorse. And inference on a 3090 works very well with it, with a good context window when the model is quantized to Q4, without overloading the GPU.
And remember, while MoE models are tempting because they're so efficient for inference, experience shows they are much harder to train than dense models.
I hope this helps, otherwise feel free to ask.
ItilityMSP@reddit
The architecture changed this week, unsloth unlocked reinforcement learning fp8 for 50,60 series rtx Blackwell chip, I would get a 5060 ti or two and train off of that less power than 4060 and able to take advantage of Blackwell future... train using qwen3 8b and lots of head room for other aspects like router, llm judge or even second writing model.
mundane_marietta@reddit (OP)
I'm kind of locked into the 5070 or 4080 right now. Maybe in a year I could buy another 5070 and a new power supply to double up and get to 24 gbs.
Maybe for now, train on the cloud, but will the 5070 be good enough at running the model locally?
Tyme4Trouble@reddit
I am not aware of any combination of enthusiast hardware or model that will generate results that aren’t riddled with errors. You’ll be filling your site with slop posts (25 a day is also likely to sound alarms for search engine crawlers).
My suggestion is to focus on building targeted research tools to help YOU write fewer high quality stories faster and with less effort.
mundane_marietta@reddit (OP)
Oh, okay, so it's really not feasible at the moment to scale out a local writing assistant in my own style without it sounding like AI slop? I thought if I fine-tuned the training, it would provide decent results.
So, focusing on a research tool that cuts down on time would be great too, and still something I could potentially package and sell to colleges.
mlrunlisted1@reddit
Grab the 3090 at $775. 24GB VRAM crushes fine-tuning and inference for your 7B model and future-proofs you. Best value by far.
mundane_marietta@reddit (OP)
Yeah, doesn't seem like bad value.
Would two 3060 12 vram GPUs provide similar results? My Microcenter has two for $200 and my motherboard has two PCIe Gen 5x16 slots, so plenty of bandwidth
Own_Attention_3392@reddit
Training models requires an absolutely astronomical amount of VRAM, far more than the amount of VRAM required to actually run a model.
You might look into other techniques for achieving similar goals without fine-tuning a model: the football stats can be handled via RAG or MCP to pull the appropriate data from a statistics API.
Most models will do fine with emulating a writing style as long as you can give it a few examples of how to write, a full training is not necessary.
I suspect you can achieve your objectives for roughly $0.
mundane_marietta@reddit (OP)
I can pull most of the data online and put it into an xlsx document, so from my brief research, a RAG should work well, but would I need to chunk the dataset while embedding? How would that work with an excel sheet with over 2000 prospects in it and a larger one with close to 15k. Or even datasets like this? https://ghsfha.org/w/Special:GHSFHA/season/players/2024
So the 4080 wouldn't serve much of a difference compared to the 5070 when it comes to training? I'm also upgrading GPUs to improve my video editing experience, so I'm making the upgrade regardless.
I guess what I'm trying to figure out is that I could train this model on the cloud, and then eventually run it locally on my PC? Does it really matter what GPU I pick between the two options?
I don't mind spending money, and with the GPU shortage heading into 2026, the resale value will probably remain consistent, but I also agree, I need to checkout services like runpod, it seems.