Requesting advice on local AI setup for academic use

Posted by The_Paradoxy@reddit | LocalLLaMA | View on Reddit | 15 comments

I'm about to do a clean install of Ubuntu 26.04 on a desktop that has a 5060ti 16gb and a 4060ti 16gb. Can you help me work out the best local AI setup for my use cases? All advice no matter how minimal is greatly appreciated, 🙏 thank you!

My most immediate question is vLLM vs llama.cpp and with what settings? But I'm also trying to figure out what sort of agent workflow makes sense for me. I am concerned about security if that makes a difference between llama.cpp and vLLM or between all of the different agent harnesses. I've heard that I should disable thinking for Hermes, but would that also make sense for open code? Is it possible to do multiagent orchestration on my hardware or do I need to dream a little smaller? If I want to be able to remotely ssh into my desktop to use agents, what are best practices for security?

Full specs

GPU 1: 5060ti 16gb on pcie gen 5 x16

GPU 2: 4060ti 16gb on pcie gen 4 x4

CPU: 7950x3d

Motherboard: B650 aorus pro

USE CASES:

Code documentation and generation:

- I do research using computational game theoretic models. My code makes heavy use of numpy, numba jit compiling, and is written for performance (parallelizing as many independent computations as possible) and is not written for easy readability/interpretability. My understanding is that, if I want actually useful code assistance, the first thing I need to do is generate clear documentation what my code is doing, and how it is implementing a model as described in a paper.

- Once I've gotten the code reasonably documented I'm hoping I can get decent assistance at extending my models without butchering all of the optimizations I've put into my code. Any advice on agentic workflow for coding complex dynamical systems, or any context in which you make relatively abstract use of array operations, is much appreciated.

Research writing assistance:

- I am hoping that I can use an agent to search the Internet for relevant background literature and to compile summaries of what it finds.

--- however I am concerned about security for this. How much is an issue is prompt injection for local AI? Are there any best practices for using an agent for broad web search?

--- I'm also wondering in anyone had advice on prompting for this long is work. I'm my experience LLMs tent to focus more on key word similarities rather than a paper's actual content. This is a big issue for me since I do interdisciplinary research where the most relevant terms on a topic differ between researchers who are trained as economist, anthropologists, cognitive scientists, etc. . I'd really appreciate any advice on how to get a model to pay attention to the bigger picture, what conclusions are being drawn, and to not over index on key words or what happens to be said in the first couple pages of a paper

(Possible use case) Question answering for students:

- I teach an intro data science class and often spend time responding to student emails with simply telling them where to look in the lecture notes or giving them Socratic questions to help them think through their problem. I'd love to be able to set up an email address that the students can use to ask an AI questions where the AI has access to lecture notes and has learned to not just give students the answers but instead to help them think through the problem. I only have about 100 students a semester, so I'm not too concerned about heavy traffic. My biggest concerns are:

--- All of the local models I can run will have a bias towards just giving students the answers rather than helping them think no matter how much I try to prompt them to reply to emails in a particular way.

--- This feels like it will be asking for trouble from students who are just trying to cause problems. If I give an agent access to an email address, are students going to be able to prompt it to change the password for the email address?