Hosting a private LLM for a client. Does this setup make sense?
Posted by nullReferenceError@reddit | LocalLLaMA | View on Reddit | 32 comments
I’m working with a client who wants to use AI to analyze sensitive business data, so public LLMs like OpenAI or Anthropic are off the table due to privacy concerns. I’ve used AI in projects before, but this is my first time hosting an LLM myself.
The initial use case is pretty straightforward: they want to upload CSVs and have the AI analyze the data. In the future, they may want to fine-tune a model on their own datasets.
Here’s my current plan. Would love any feedback or gotchas I might be missing:
- RunPod to host the LLM (planning to use LLaMA via Ollama)
- Vercel’s Chatbot UI forked as the front end, modified to hit the RunPod-hosted API
Eventually I’ll build out a backend to handle CSV uploads and prompt construction, but for now I’m just aiming to get the chat UI talking to the model.
Anyone done something similar or have tips on optimizing this setup?
sshan@reddit
Are you sure this is a good idea? If you were actually doing this on prem or even in their private cloud I'd say sure...
Enterprise/Paid plans of the major players don't train on your data. They have a privacy policy.
This is you spinning up a custom application and hosting it on a 3rd party. Your security skills are far worse than Google or Microsoft's...
Nomski88@reddit
Whoops, sorry your data got leaked/hacked. Here's your $18 from the class action lawsuit...
Individual_Holiday_9@reddit
^ this is why I’d never trust OpenAI or any of these companies
They have no respect for privacy or copyright
I subscribe to Gemini AND chatgpt they’re great tools but I’d never use them for anything truly secret
sshan@reddit
Sure that’s a real risk. But every company uses cloud providers. Including those with very sesnsitvie data
reneheuven@reddit
I have a similar requirement at the moment. Though Google/Microsoft have a privacy policy, my prospect does not trust them with the business sensitive data + have no money to start lawsuits against these giants. Thus yes, it makes perfectly sense to host on premise or within a private cloud. And why would Microsoft or Google know more about cyber security? When hiring the right experts we can be as good or even better as Google/Microsoft.
Bonananana@reddit
You have no idea how wrong you are.
Former-Ad-5757@reddit
But you are saying on premise or private cloud, while the question is about runpod which is afaik neither, I haven't had a lawyer look at how runpod handles sensitive data. But I have had a lawyer look at how my private cloud handles it and how google / microsoft handle it.
Runpod is for me just one company I can use for non-sensitive projects, but so are a lot of other companies.
nullReferenceError@reddit (OP)
I’m not sure, that’s why I’m asking. Good point re:enterprise plans. I’ll look into that. Thanks!
loyalekoinu88@reddit
You’re using the cloud for processing. If you’re doing that you might as well go the azure route which already has deployable private LLM in the cloud with likely better security than you’d figure out on your own.
Why not build an on-prem cloud?
curious-bonsai@reddit
Valid points, cost matters, but so does sleeping at night. Have you thought about a managed private environment? Something like Bluehost could be a safer foundation before things get messy.
loyalekoinu88@reddit
Bluehost has servers capable of running LLM?
What about on-prem would prevent you from sleeping?
curious-bonsai@reddit
No no, I wouldn’t spin up a 65B model on it, but not everything needs to be torch.distributed with A100s either.
loyalekoinu88@reddit
For sure, I guess because I don’t know the details like what model you used and what the criteria are for a working proof of concept it’s hard to understand what you’re trying to run.
nullReferenceError@reddit (OP)
Good point, thank you. I assumed RunPod's services are a lot cheaper than Azure, but maybe I'm wrong.
loyalekoinu88@reddit
With Azure you’re basically paying for private endpoints with the “security” more or less done for you. If security of information and having information stored privately is important then to me it’s the cost of doing business.
HOWEVER, how much data is actually being processed? How often? And ultimately does speed actually matter?
nullReferenceError@reddit (OP)
I think initially it's not a lot of data, something like max 70k records, that's IF they dump their entire db. Most likely much smaller than that in smaller sets of data be be analyzed. I'm guessing a few times a week. I think speeds does matter.
loyalekoinu88@reddit
A few times a week to me doesn’t really seem like speed is essential. What type of analysis are they trying to do with the data? Remember some models are better than others at certain tasks. Did you perform a proof of concept with them?
reneheuven@reddit
Given you want to host this yourself: how to achieve a scalable AI cloud private cloud solution? To avoid reinstalling when moving to a more performant instance? I need RAG, PDF file uploads, MCP and REST APIs. Chat interface is a nice-to-have. All EU based. GDPR, ISO 27001, SoC 2 compliant. No US bases owner for the hosting provider. Also no Chinese or Russian ;).
No_Afternoon_4260@reddit
Then good luck, hope you like to spend money! I know hetzner in Germany and may be there's something to do with ovh in France, last time I checked they were kind of expensive. I'm curious if someone knows better providers.
pontymython@reddit
Just get a Vertex/Bedrock account and use the enterprise tier of the big cloud providers. Privacy guarantees are built in.
nullReferenceError@reddit (OP)
Aren't those bigger cloud providers difficult to set up?
pontymython@reddit
Probably not as complex as what you're proposing to home roll, especially when you think about backup and availablity strategies. Something like open-webui using the OpenAI API is a breeze, it's persistence is sqlite as standard so just a volume is needed, or plug in a database.
Gemini 2.5 or o3 can help you write the Pulumi/ Cloud formation/ insert your IaC here.
Librechat is probably my sweet spot but it has at least a few dependent services, so not as neat as open-webui.
nullReferenceError@reddit (OP)
Interesting. Wouldn't something like open-webui using Open API still have privacy issues since they can still train off of the data?
pontymython@reddit
I didn't think they trained on API data, just the public chat products, and found this: https://community.openai.com/t/does-open-ai-api-use-api-data-for-training/659053
For the even more concerned, use Azure's OpenAI service which gives you a slightly more managed version, although tbh I'm not sure of the real difference besides MS being responsible for your data security instead of OpenAI.
BacklashLaRue@reddit
Powerspec gaming machine and 16 gb video card with ollama, deepseek (or other) and AnythingLLM to put the data into a vector database all running disconnect from the world. I did mine for just under $2200 and runs great. We have loads of data that cannot be in a public model.
pab_guy@reddit
You are concerned about privacy but intend to use a cloud service to host the LLM? TF?
pab_guy@reddit
That’s… not private. JFC you don’t know what you are doing at all.
Former-Ad-5757@reddit
Why ollama? I would just use llama.cpp server direct or even better something like vllm.
For me ollama has the wrong attitude regarding defaults which has a chance of fixing now and being reversed in a newer release, or getting strange new defaults in a newer release.
Until they change their attitude I can't take them serious for building apps on.
nullReferenceError@reddit (OP)
Good questions. I just defaulted to it, open to suggestions.
No_Afternoon_4260@reddit
Vllm, tensor parallelism, concurrent requests.. just a few key words that might interest you, that's how you optimise your setup.
Llama.cpp also does concurrent requests with a shared ctx, ollama I have no idea
AdamDhahabi@reddit
I would go for Open WebUI, it has a ton of features, especially RAG.
iamofmyown@reddit
We run a small server with basic level old GPU but lots of ram to handle such use case. Serving in production for very small user base as qa with internal doc type chatbot