how would you set up a local llm server for a business of 7 people?

Posted by snowieslilpikachu69@reddit | LocalLLaMA | View on Reddit | 42 comments

Okay so i've been stalking this sub for some time and i run the occasional small 2-8b model on my laptop (not the best) for fun

but say my role at a company is to set up a local LLM since we obviously don't want confidential data going to other companies etc /

main use case would be queries, rag, general use nothing crazy except for maybe 1 or 2 people using it for programming purposes.

i was thinking of gemma 4 26/31 or qwen 3.6 27/35. how do these models scale with concurrent users? i know i could run one of these on a 5090 and some extra or a 48gb macbook pro w unified memory but not sure how these scales with multiple users.