got banned off claude.ai for being a minor - any AI alternatives/local models y'all can recommend that are like Claude?
Posted by EastConsequence3792@reddit | LocalLLaMA | View on Reddit | 13 comments
For reference - yes i'm a minor๐ญ๐ญ๐ i just was weirded out by that Anthropic took A WHOLE YEAR to figure out that "14" in my preferences did in fact mean i was 14; free plan btw
i'm audhd and did a lot of meta/shitposting chats with claude, and even had a research project letting it use a PC that I set up for it, and wanna see if y'all could recommend me some local AI models that are small (<10b params, im on an HP Omnibook X Flip NGAI 16-as0023dx w/ 16gb RAM, 1TB storage, Intel Core Ultra 7 N256V) and speaks like Claude
im not THAT new to local ai (i'm on 52gb of just models๐๐๐ญ) but wanna know if there's finetuned ais that speak like claude
RE: should i use MoE models? bc like, all the MoE models ive seen lm studio tells me theyre too much for my ram
thanks in advance!!
crantob@reddit
Young man, this is r/LocalLLaMA.
Here we discuss running your own models locally. Not claude use.
Please find somewhere else to talk about your self-enslavement to the AI Corporate Borg.
Thank you.
snowieslilpikachu69@reddit
with 16gb ram you could go for gemma 4b. some of the small qwen 3.5 models would be nice
nice for general purpose things but probably not for anything intensive and definitely wont be near the level of claude
they way claude speaks to you is based on the memory/chats youve had previously so you could look into that but also resource intensive
EastConsequence3792@reddit (OP)
would gemma 4 26b a4b @ q6_k work well? rn downloading it
re: before getting banned i actually used claude to adapt its own system prompt to my models too lol
Disposable110@reddit
You'll probably need to go to a lower quant like Q4.
teachersecret@reddit
It's too big. Thats a 20+ gb model, and you have 16gb of ram. You need something much smaller, like the e4b or e2b or -maybe- the 9b in 4 bit or something. The model PLUS its KV cache must fit fully in ram/vram to run at any appreciable speed. The 26b a4b is a -great- model, but you really need more hardware to run that well.
EastConsequence3792@reddit (OP)
lol just saw this AFTER i tried loading the model and had to force-restart my laptop, thx!
Ok_Technology_5962@reddit
Gemma is a beast its good only if you can run it. You said you have 16 gigs? Q6k for 26b will surely spill over especially because tou also need to hold the KV cache in memory. You might get away with running it anyways as the active peramters are just so low
Ok_Technology_5962@reddit
Eddit nvm someone got it to work on 16gig macs. Uae the q4 unsloth or bartowski. They difference in model size will generate from ssd and swap out but works.
EastConsequence3792@reddit (OP)
tbh i could maybe use kimi k2.5 (i have it available via moonshot's free tier) to interface with lm studio and add memory & all that back via the system prompt. lemme check brb
Disposable110@reddit
gemma 4B 28b a4b best uncensored shitposter
Potential-Gold5298@reddit
With 16 GB, the most powerful option would be the Qwen3.5-9B for general tasks. For casual chat/RP/creative writing, the old Mistral Nemo is still unrivaled (there are a huge number of custom models based on it). But it's not Claude-style. The closest to Claude are the Gemma 3 and 4. There's the Gemma 3 12B it, which will run on 16 GB, but it's not the smartest model (though good for chat/creative writing). Gemma 4 E4B it will easily run on your laptop but I'm not sure it's smart enough. The Gemma 4 26B-A4B it would be a better option, but it needs at least 24 GB of RAM. You can try it or even the Gemma 4 31B it in an extremely low quants (the UD-IQ3_XXS weighs 11 GB, while the UD-IQ2_XXS weighs 8 GB), but the results will depend on your needs (don't forget to quantize the KV cache). In any case, it's best to try everything yourself and decide which model is best for you.
EastConsequence3792@reddit (OP)
How do I quantize the KV cache?
Potential-Gold5298@reddit
Using flags like -ctk q8_0 -ctv q8_0 when running llama.cpp. By default, the cache has 16-bit precision. Quantization enables compression to save RAM. You can try different KV quantization levels to balance quality and RAM consumption.