Please help me

Posted by aaAS69@reddit | LocalLLaMA | View on Reddit | 18 comments

hey guys. I'm a student that uses ai for research and feedback of my work. Since Claude flagged me for being under 18 i got banned and lost a considerable amount of datam to avoid that happening again I want to use a local LLM. I have an rtx 5080 build that i use to game, is that adequate for running a claude alternative? if so what models should I use.

[-]

RedParaglider@reddit

Hell yea dude, you can run a small model on that bad boy. Just understand that you won't be able to tell it to go make you an application. If I was in your shoes I'd utilize one of the free chat systems to help build out a solid HLD, then a solid SDD, then a solid task level development plan. Then you are going to need to work with your small local LLM to code each module one at a time. The downside is that it's slower, the upside is that your modules will be hella tight, and you will have to do some troubleshooting and some manual programming with it so you will slowly become a badass while you do it.

[-]

aaAS69@reddit (OP)

thanks a lot man. I'll definitely check that out but with my (miniscule amount of) research I realized using my pc to run an llm will be more of a hobby project than a replacement to claude. i used a few models and realised chatgpt is the closest i'm going to get

[-]

ea_man@reddit

Can we have a pinned thread for these Claude exiles?

[-]

CATLLM@reddit

It depends on what kind of research you are doing. Just a llm is not enough. What makes claude great is the harness / system prompt.

What type of research are you doing?

[-]

aaAS69@reddit (OP)

i'm in the ib, so i'll need summaries of research papers and research papers that exactly have what i'm looking for.

[-]

CATLLM@reddit

summaries are pretty simple. you can rum llama.cpp and just the built-in webui for something quick and dirty. use something like qwen3.5 9b Q4. You can fit the whole model in vram with enough for 64k kvcache q8. I think this way takes very little effort to test out to see if it meets your needs.

https://github.com/ggml-org/llama.cpp

https://huggingface.co/unsloth/Qwen3.5-9B-GGUF

[-]

Adventurous-Paper566@reddit

With your GPU Gemma 26B A4B with LM-Studio at Q4 or Q6 with some experts offloaded on the CPU is a nobrainer.

[-]

Red_Redditor_Reddit@reddit

You can run larger moe models if you have extra ram. I have a 4090 and 96G of ddr5 ram. My go-to is either GLM 4.6V or qwen 3.5 122B. If speed is paramount and I'm willing to loose quality, there's smaller models that will fit comfortably in the 24GB of vram.

Your probably not going to get the same speed and quality as the online models, at least unless you're willing to spend lots of $$$$.

[-]

C0rn3j@reddit

to avoid that happening again I want to use a local LLM

How does a local LLM produce 3-2-1 backups?

Spoiler: It doesn't, you need backups, locality is irrelevant.

[-]

Several-Tax31@reddit

Claude quality -> No. Local llm -> yes.

Download qwen3.5 35B or qwen3.5 27B quantized, and llama.cpp.

[-]

aaAS69@reddit (OP)

alright, is that the best i can use specifically for reasoning, writing and research? i dont need anything else

[-]

Several-Tax31@reddit

Reasoning yes. Writing -> I assume you're talking about technical writing and not literature writing. Then yes. (The model may or may not good with literature writing, I never used them for that) Research -> This is tricky. You need to configure a web fetch tool for web research. (either llama.cpp or with an agentic framework like opencode or hermes) Also keep in mind you don't get the speeds of cloud models. Searches ten websites in seconds, forget that, it will take 10 minutes or more. But overall, the models can do that.

[-]

aaAS69@reddit (OP)

do you think i'd be better off just using chatgpt or glm 5.1

[-]

Several-Tax31@reddit

In general, having a local llm is handy. I don't have to care about tokens or bans or my data loss. I realize I start using my local models more and more over the time. I probably didn't use chatgpt for months. Also, my agent with qwen 3.5 35B does many heavy lifting on my local computer, it downloads and installs stuff, checks my hardware logs to see everything is okay, etc. Overall, good experience.

But you shouldn't expect everything. For research purposes, I use kimi 2.5, that model is very good to search tens of websites in seconds and identify issues.

Keep in mind the learning curve is very steep, this is like a rabbit hole. If you like technical stuff, sure, its fun. If you don't have time, yeah, possibly go with cheap sota chinese models like kimi or glm.

[-]

TutorDry3089@reddit

The biggest advantage of local LLMs is cost and privacy, but neither seem to be priorities for you. I suggest trying other providers like DeepSeek, GLM, or others mentioned in the comments. They’re much smarter and faster than what you could run locally.

[-]

Express_Quail_1493@reddit

- setp1. download lmstudio and click search models download qwen3.5-9b.
- step2. load your model here are settings to avoid over spill => kCacheType=q0 VcacheType=q8. context slider you are safe to set it to 128k. enable developer model and enable server.
- step3. connect it to your chatGPT like interface (download python and run command:
pip install open-webui
- run: open-webui serve and navigate to http://localhost:8080
- go to settings and set you lmstudio to openwebui lmstudio url is http://localhost:1234
- you should have all the chatGPT like features
- if you want to code? download opencode and link lmstudio to opencode

[-]

hoschidude@reddit

Qwen 3.5 locally... or use another cloud API (like deepseek or GLM 5.1).

They offer similar quality and are much cheaper than Anthropic anyway.

[-]

Excellent_Koala769@reddit

Depends what Claude model you are used to using. The closest thing to Opus 4.6 is probably GLM 5.1. But that won't fit on your GPU.