Looking for a LLM that is close to gpt 4 for writing or RP

Posted by Intrepid-Biscotti912@reddit | LocalLLaMA | View on Reddit | 7 comments

Hey everyone,

Quick question: with 288GB of VRAM, what kind of models could I realistically run? I won’t go into all the hardware details, but it’s a Threadripper setup with 256GB of system RAM.

I know it might sound like a basic question, but the biggest I’ve run locally so far was a 13B model using a 3080 and a 4060 Ti. I’m still pretty new to running local models only tried a couple so far and I’m just looking for something that works well as a solid all-around model, or maybe a few I can switch between depending on what I’m doing.

[-]

dreamyrhodes@reddit

Cydonia finetune by The Drummer is 24B and the gguf is capable of running on a 4060 Ti at a somewhat acceptable speed with its Q4_K_M quant. It's so far the best model to my knowledge for this hardware range.

But ChatGPT 4 is about 1.8T parameters. Guess how far you would get with a local installment. Even if you get to 70B or 250B with CPU offloading (which slows everything down significantly).

[-]

Intrepid-Biscotti912@reddit (OP)

I'm more taking about the vram in my other hardware. I don't run LLMs on my computer anymore XD. If anything my desktop has mostly became host to Plex and file storage. I do aim to move it to the server at somepoint once everything is set up and I actually have it in my ownership.

Why run 24B on a A6000 blackwell when there's plenty of other options XD. I'm trying to find a "good" enough replacement for general chatter and RP. I know local LLMs won't get near GPT4 in all aspects.

[-]

LagOps91@reddit

Kimi is likely too large, but GLM 4.6, ling or deepseek v 3.1 should all run on your setup. With large models like that even q2 is fine to run. Lots of models for you to try out. I wouldn't bother with glm 4.5 air - 4.6 is much better/smarter and makes better use of your hardware.

[-]

TheRealMasonMac@reddit

GLM-4.6 or wait for GLM-4.6-Air.

[-]

Klutzy-Snow8016@reddit

You could try GLM 4.5, GLM 4.6, and the different versions of Qwen3 235B-A22B (2507-Instruct, 2507-Thinking, VL-Instruct, VL-Thinking). You might be able to get even larger models like DeepSeek, Kimi, or Ling-1T running, but heavily quantized and with CPU offload.

[-]

Adventurous-Gold6413@reddit

Not sure tbh I only got 76gb of available combined vram and RAM and run GLM AIR 4.5 I really like it. Now I don’t really use it for much but I think it’s good

[-]

Intrepid-Biscotti912@reddit (OP)

yeah, I know most run smaller models but was hoping people had ideas for larger models. I don't see myself using it a ton but if it can come close to replacing gpt 5 for my general usage (I don't do coding or anything beyond writing, RP and just conversation) I'd be more than willing to cancel my GPT sub.