AI server help, duel k80s LocalAGI

Posted by JcorpTech@reddit | LocalLLaMA | View on Reddit | 7 comments

Hey everyone,

I’m trying to get LocalAGI set up on my local server to act as a backend replacement for Ollama, mainly because I want search tools, memory, and agent capabilities that Ollama doesn’t currently offer. I’ve been having a tough time getting everything running reliably, and I could use some help or guidance from people more experienced with this setup.

My main issue is that my server uses two k80s, old but I got them very very cheap and didnt want to upgrade without dipping my toes in. This is my first time working with AI in general so I want to get some experiance before I spend a ton of money on new gpus. k80s only support up to cuda 11.4, and while localAGI should support that it still wont use the GPUs. Since they are technical 2 gpus on a board I plan to use each 12gb section for a different thing. not ideal but 12gb is more than enough for me testing it out. I can get ollama to run on cpu but it also doesnt support k80s, and while I did find a repo ollama37 for k80s specificaly that is also buggy all around. I also want to note that even in CPU only mode LocalAGI still doesnt work, I get a verity of errors but mainly backend failures or a warning about the legacy gpus.

I am guessing its something silly but I have been working on it the last few days with no luck following the online documentation. I am also open to alternatives instead of localAGI, my main goals are an ollama replacemnet that can do memory and idealy internet search.

Server: Dell PowerEdge R730

CPUs: 2× Xeon E5-2695 v4 (36 threads total)
RAM: 160GB DDR4 ECC
GPUs: 2× NVIDIA K80s (4 total GPUs – 12GB VRAM each)
OS: Ubuntu with GUI
Storage: 2TB SSD

[-]

offlinesir@reddit

Luckily for you, the price of the k80 has actually increased since you bought it due to the rise in demand. Sell it on eBay, you'll likely get a bit more then you're expecting.

[-]

JcorpTech@reddit (OP)

Yea that's what I'm seeing, I'll take a win when I can get it lol.

[-]

No-Refrigerator-1672@reddit

One confucing things about GPUs is that CUDA versions basically mean nothing, and all is determined by "compute capability" - basically which instruction set the gpu die has. Kepler's compute capability is too old to support anything AI related; that should be the reason why this "LocalAGI" project refuses to use them despite nominally supporting CUDA 11.4. You can't really do anything useful with them anymore, unfortunately.

[-]

JcorpTech@reddit (OP)

Yea that kind of what I am getting, I am probably going to resell. Picked them up for $25 a peice so at least I should be able to get my money out.

[-]

No-Refrigerator-1672@reddit

If you would shop for replacement, I would advocate against multi-chip GPUs. Once you get involved deeper into AI, you'll find out that, first, less that 16GBs of VRAM is too small to host anything smart, and second, splitting models across multiple GPUs aren't as easy as it may seem by the docs. A single chip with 24GBs attached to it would be the minimal entry step into proper AI.

[-]

JcorpTech@reddit (OP)

For the short term I was looking at a m60, like $50 on eBay. I will definitely take your advice but I'm looking to run small stuff till I'm actually hooked. Any thoughts on that card? It's still a 2 GPU system, but supports modern cuda

[-]

No-Refrigerator-1672@reddit

I used to use M40 for LLMs, so based on my experience I can predict that M60 will be extremely slow. It is two small chips with low amount of old arch cores. Your LLM ceiling will be like a 10-12B Q4 model, you'll only be capable of running ollama and llama.cpp, and your gwneration speed will be in a ballpark of 10 tok/s for a single question, and more like 5 tok/s in an agentic environment with tool calling. If you can get it for $50, it would be good enoigh to get your first project/deplyment running, but I can guarantee you, you'll urge to replace those cards the moment you'll start using your local ai daily.