sammcj

Shout out to TabbyAPI - it's by far the best ExLlamaV2 server I've tried

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 38 comments
Has anyone come across a good (open source) "AI native" document editor?

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 20 comments
ESP32 -> Willow -> Home Assistant -> Mistral 7b <<

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 4 comments
How are you managing your prompt collection? (Personal prompt library/templates etc)

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 12 comments
It's been a while since we had new Qwen & Qwen Coder models...

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 53 comments
DeepSeek banned from Australian Government Devices

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 71 comments
Lllamalink - Automatically symlink your Ollama models to lm-studio

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 2 comments
Biased LLM Outputs, Tiananmen Square & Americanisations

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 0 comments
"Hey Ollama" (Home Assistant + Ollama)

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 19 comments
Ollama has merged in K/V cache quantisation support, halving the memory used by the context

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 139 comments
I modified that Qwen Code Artefacts demo on HF to use Ollama locally

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 1 comments
Ollama now runs inference concurrently by default

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 24 comments
RIP My 2x RTX 3090, RTX A1000, 10x WD Red Pro 10TB (Power Surge) 😭

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 138 comments
Anyone want to test my PR to enable quantised K/V cache in Ollama

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 17 comments
My jank 2x 3090, 1x a4000 setup

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 9 comments
It's been a while since DeepSeek released a new coder lite model...

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 9 comments
llama.cpp merges support for TriLMs and BitNet b1.58

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 10 comments
Open LLMs catching up to closed LLMs [coding/ELO] (Updated 10 July 2024)

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 173 comments
Ollama merges tooling support

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 13 comments
Ollama merges OpenAI compatible API endpoint for batch embeddings

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 0 comments
Plandex - AI driven development in the terminal

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 9 comments
I'm looking for a diagram that shows the rate of improvements of Open LLMs and how they've caught up to closed / API LLM providers

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 7 comments
Ollama adds /v1/models and /v1/completions OpenAI compatible APIs

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 6 comments
Ollama now runs inference in parallel by default

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 0 comments
If your DeepSeek Coder V2 is outputting Chinese - your template is probably wrong (as are the official Ollama templates)

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 14 comments
CUDA Graph support merged into llama.cpp (+5-18%~ performance on RTX3090/4090)

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 41 comments
The impact of flash_attention - comparing LM Studio (w/ FA) with Ollama (w/o FA)

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 22 comments
Gollama - An Ollama model manager (TUI)

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 6 comments
GGML Flash Attention support merged into llama.cpp

Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 114 comments