Using Locally hosted LLMs for the workplace

Posted by Relevant-Cash-7270@reddit | LocalLLaMA | View on Reddit | 3 comments

There are thoughts going on about using AI to manage workflows in company, but this might involve feeding the AI database sensitive data.

Is using a local LLM, say for one department, for this reasonable?

I ask because I feel local llms have been evolving rapidly, I’d like to know if the state of the art is there yet.

[-]

samehmeh@reddit

For a single department that needs to automate workflows with sensitive data, local LLMs are reasonable now. Models like Llama 3 70B or Qwen 72B on a decent GPU server handle structured extraction and summarization well enough for internal use. The gap vs cloud models narrows fast when you add RAG with your domain data. Main bottleneck is someone owning the infra and keeping models updated, not model quality. But then again, you'd need to cosider which LLM to use from a security point of view.

[-]

m18coppola@reddit

Yes and no... The two biggest factors you need to consider is:

VRAM requirements - What size model are you going to use and what size context-per-user do you want to provide?
Compute - How many concurrent requests can you expect at peak usage hours?

These factors will wildly vary what hardware you need. If you only expect a handful of people using it at the same time, you might get away with a hearty desktop with a couple RTX **90 cards with a smaller model. If you expect >6 users concurrently making complex requests at the same time, you might want to consider shelling out the money for something a little more enterprise grade.

[-]

gulliema@reddit

If it's a machine that runs on the company network: sure, why not? Just make sure you test and check it again and again, and I'd make sure the machine is isolated from the internet