Is there any hope of something lightweight to explain Linux commands, for somebody with a GTX1050?
Posted by floofcode@reddit | LocalLLaMA | View on Reddit | 5 comments
I am looking for something like this:
Input:
```
- name: << LLM needs to generate this.>>
ansible.builtin.dnf:
name: epel-release
```
Output: "Enable EPEL Repository"
I'm developing some snippets to quickly create ansible tasks, but once the snippet is generated, I need an LLM to briefly describe what it's for so I can set that as the name for the LLM and maybe give me 2-3 suggestions so I can pick which one I want.
It doesn't seem like this needs a super giant model. Will I need to train something myself or is there something that can run on my ancient GPU?
Temporary_Expert_731@reddit
Llama 3.2 1B with a Q4 will fit in 1.3 GB of VRAM, I would just try running that in ollama and dropping in a few snippets to see if it works for you.
FullstackSensei@reddit
Why not use the free tier of an online API to generate those descriptions? You shouldn't have any sensitive info in those scripts anyway.
floofcode@reddit (OP)
It's more about trying to figure out how to make use of LLMs in very narrow situations like this without an API. Do use-cases like this still need a huge amount of VRAM?
FullstackSensei@reddit
Depends on your definition of "huge". You can always offload some layers to system RAM and use combined GPU and CPU inference. Your use case sounds like you could get away with an 8-9B model, which would run decently at Q8 even on full CPU if your ansible playbooks aren't too long or too complicated. You'll need to be more descriptive and more precise in your prompt, so the LLM has a better chance of "understanding" what you want, but I'm of the opinion that you're always better off doing that anyways. You might also need to provide some context about your tasks and modules if they're not common patterns already beaten to death online (already in the training data of the LLM).
The suggestion to use an API is more for time saving, since you'll be able to iterate over your prompt and figure what additional information you'll need to provide in your prompt to get the output you need. But if you're using it as an excuse to get your hands dirty running local LLMs, then you can do the same 100% offline. It'll just be slower to iterate without a powerful GPU
BoeJonDaker@reddit
I use Llama 3 8B abliterated. You'll have to use a GGUF and offload some layers to RAM, but the speed is decent.
Actually, I just tried it on 100% CPU and got 5 tokens/sec.