5090 April 2026, Philosophical Reasoning & Logic - best models? Plus specific questions (instruct vs training; etc.)

Posted by filmguy123@reddit | LocalLLaMA | View on Reddit | 12 comments

Semi new to Local LM and have a serious of questions I am hoping people can point me in the right direction with. I am using LM Studio.

As of now, with 32GB VRAM, what are the best models for philosophical reasoning and logic? Discussions, as well as assessing essay drafts, compiling summarizing synthezing philosophical notes and turning them into a coherent outline structures or arguments, checking for logical/rational validity as well as factual accuracy, etc.?

I have played with Gemma-4-31B Q4_K_M and Qwen 3.5 27B Q4_K_M and they seem surprisingly good for local only models. Is this the best sweet spot for me?
Gemma-4 is often labeled "IT" - does this meaning Instruct + Thinking? Or just InsTruct? I would imagine I want thinking for me, but it does not show the thinking prompt like Qwen does?

\^\^ Those are my main question. For those willing/interested, I also have several other questions that follow:

Are the models labelled "heretic" and "uncensored" a trade-off vs the default model? IE reduced accuracy for the benefit of no rails? Or should they almost always be preferred?
There are often redundant copies in the repository from different users. How do I shop for good ones for my uses? I don't know who the most respectable users for downloading are, or even why I might choose one over another.
Unsloth, LMstudio community, HauHauCS, etc.
Is Q5 K M worth the extra VRAM usage for my listed use case? Or diminishing returns for my usage? (I know I have to balance this with reduced context window so in one sense it is personal; on the other hand knowing if it is recognized as being genuinely useful is helpful so I can try to chunk things if needed).
Is there any reason for me with 32GB VRAM to ever choose an MOE model over dense? Since the way it loads means I can't load a 70B or 120B MOE model in VRAM anyway, it seems the only benefit to going to something like Qwen 35B-A3B is if I want to dump in a very large amount of text and actually have it fit context window with chunking?

Finally I should ask... anything you wish you knew starting out that I should know? I basically know nothing other than the basic interface of LM Studio and choosing a model that fits my VRAM footprint. I understand only the basic premise of context windows.

[-]

Leafytreedev@reddit

A lot of your questions can’t be generally answered without a lot of personal experimentation. Since Llama 2 times Q4 has always been the preferred quant for size and quality. Yes for a 5090 qwen 3.5 27b and Gemma 4 31b it are the current kings but Gemma 4 is relatively new and shipped with lots of problems for many existing backends.

[-]

filmguy123@reddit (OP)

Thank you for the help. One big confusion point I can't find an answer to is, Gemma-4 is often labeled "IT" - does this meaning Instruct + Thinking? Or just InsTruct? I would imagine I want thinking for me, but it does not show the thinking prompt like Qwen does?

[-]

Leafytreedev@reddit

The “IT” part is a naming anachronism that was popular back in the llama days (previous kings of open LLMs). It means that this specific model was further fine tuned from the base model to better follow instructions. Yes this model does have thinking and whether or not you want it think during inferencing is in your control.

[-]

Still-Wafer1384@reddit

How much system RAM do you have ?

[-]

filmguy123@reddit (OP)

128GB system RAM. 9950x3D. RTX 5090 32GB VRAM

[-]

Still-Wafer1384@reddit

You can run massive MoE models with that setup. Don't limit yourself to thinking that everything needs to fit in your VRAM.

[-]

filmguy123@reddit (OP)

I see, so if I am content with lower speeds, I can run a much larger model 70B+? But if I do this, you suggest running the MOE version?

[-]

Still-Wafer1384@reddit

Yes because MoE models only has a limited number of active experts making it better suitable to a VRAM / RAM split

[-]

Still-Wafer1384@reddit

First question is why do you want to use a local LLM for your usecase? It doesn't sound token intensive. Does it require automation, or is it a manual process? SOTA models will always be better. Not sure also if your usecase had been tested very widely, but my hunch is that there will be a significant difference between something like Qwen3.5 27B and e.g. GPT 5.4 or Claude Opus 4.6

[-]

filmguy123@reddit (OP)

Can you clarify what you mean by a SOTA model, first time I have heard the term. Can it be run locally? Let me know what rabbit holes to look down and I will do some research on this! I appreciate, thank you so much.

[-]

Still-Wafer1384@reddit

SOTA = state of the art models, which means ChatGPT, Claude, Gemini

[-]

filmguy123@reddit (OP)

Got it. I use those too; local is for large documents of private information I deal with (RAG, or pasted into context) that I don't want to send externally to cloud companies.