offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice

Posted by BuddyBotBuilder@reddit | LocalLLaMA | View on Reddit | 77 comments

Hi everyone. I’m probably posting slightly outside the usual scope here, but I’m hoping some of you might have advice.

I’m Gen-X with no formal programming background, but I’ve been building a small AI companion project for my husband. He’s mostly quadriplegic (paralyzed legs and limited use of his hands) and spends most of the day alone at home while I’m at work. We live in a very rural area with no close neighbors or nearby friends, and the isolation has been hard on him.

So I decided to try building him a companion robot.

For the past year I’ve been scavenging parts and learning as I go. The goal is a fully local, offline mobile robot built on a small power-wheelchair base (two 24V batteries) that can talk with him and keep him company.

Current prototype setup:

LLM (conversation):

•   Mistral-7B-Instruct via llama.cpp

•   Running on a free Lenovo ThinkPad

•   Intel i5 @ 1.6 GHz

•   8 GB RAM

Speech Recognition:

•   Jetson Nano running faster-whisper (base, INT8)

Text-to-Speech:

•   Piper TTS – en\_us-ryan-medium

Right now the output is just going to an HDMI port connected to a TV while I test everything.

The main limitation is the ThinkPad’s 8 GB RAM, so I’m restricted to smaller quantized models.

My main question:

What are the best ways to maximize usable RAM and performance for llama.cpp on an 8 GB system?

For example:

•   Better quantization choices

•   Swap/zram strategies on Linux

•   Smaller models that still feel conversational

•   Any other tricks people use on low-resource systems

OS is Linux Mint 22.3 Cinnamon (64-bit).

I know this is a bit of an unusual use case, but if anyone has suggestions for squeezing more performance out of limited hardware, I’d really appreciate it.