Cheap hardware for mediocre LLMs
Posted by Clean_Archer8374@reddit | LocalLLaMA | View on Reddit | 2 comments
Hi everyone, so I have been playing around with the software side and an RTX 3090, but I'm wondering what hardware I could experiment with to get to something like a quantized 70-120B model. I really don't know what could be done beyond buying more RTX 3090s, but I'm thinking of offloading to RAM, or is there anything realistic to do on some hardware adventure, like anything that gets usable memory bandwidth to run an LLM of that size at reasonable inference speeds (at least 5 or better 10 tokens per second)? Even if it requires hardware hacking, I'm thankful for any creative ideas.
Yes-Scale-9723@reddit
used 3090s are still the best value for money.
H_NK@reddit
TMU more 3090s is unfortunately still the meta