Building a local RAG server
Posted by autonom1a@reddit | LocalLLaMA | View on Reddit | 15 comments
Hi. Corporate wants me to build a local RAG server. 50-100 concurrent interactions with the model few times a day at the first stage and 100-1000 when deployed to production.
I want to understand the hardware stack and its price. Maybe options.
Halp.
huzbum@reddit
What does your current stack look like? Might be able to integrate existing technologies like elastic search or Postgres.
Are you using any cloud services like AWS, or do you plan to put physical hardware on site? Do you already have hardware on site? What are the uptime requirements? There is a big difference between “it would be nice if this thing was always working” and guaranteed 5 9’s.
autonom1a@reddit (OP)
no stack, gotta build it ground up. No hardware that is specific for this task ATM, has to be placed in the office, no outside placement. Uptime 24/7.
huzbum@reddit
Ok, but is this going to be a service for people in the office or something with subscriptions and a service level agreement?
There is a big difference between it’s always on and probably never goes down, vs 5 9’s. Like the difference between a desktop in the corner that probably stays up for a year straight, and server racks with redundant power supplies, UPS, backup generators, redundant network connections, etc.
autonom1a@reddit (OP)
yes it gotta be always on, ready to interact 24/7 under strict SLA - immediate response and ready to handle surge uses.
huzbum@reddit
Also, does “100 concurrent users” mean 100 people hit enter at the same moment, or there are 100 people with access that will spread out use over the day?
And how smart does it need to be? Any idea what model you might want to run?
autonom1a@reddit (OP)
Mostly spread throughout the day, but sometimes they can do that almost simultaneously.
kantydir@reddit
What model(s) do you have in mind? That many concurrent requests will probably require running the model in data parallel mode or balancing between several servers if you want a decent interactive user experience.
autonom1a@reddit (OP)
I think:
Llama 3.1 32-72b
korino11@reddit
rag is dead. it useles in 90% situations. it have a stupid embedings. think about other solutions...
autonom1a@reddit (OP)
for example?
DinoAmino@reddit
Don't listen to them. They know not what they say. And what they say is not well liked. 6yrs on Reddit with negative karma.
korino11@reddit
Dude did you ever thinked WHAT give use rag and WHAT it cannot give you?
ALL that is not for RAG!
DinoAmino@reddit
I think about it all the time with my hybrid vector/graph RAG, dude. What you list does not tell me RAG is dead. You list what needs to be dealt with when using RAG. No different from anything else in computing.
korino11@reddit
most of that is solved by others ppl. And rag is stupid...
korino11@reddit