sammcj
-
Shout out to TabbyAPI - it's by far the best ExLlamaV2 server I've tried
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 38 comments
-
Has anyone come across a good (open source) "AI native" document editor?
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 20 comments
-
ESP32 -> Willow -> Home Assistant -> Mistral 7b <<
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 4 comments
-
How are you managing your prompt collection? (Personal prompt library/templates etc)
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 12 comments
-
It's been a while since we had new Qwen & Qwen Coder models...
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 53 comments
-
DeepSeek banned from Australian Government Devices
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 71 comments
-
Lllamalink - Automatically symlink your Ollama models to lm-studio
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Biased LLM Outputs, Tiananmen Square & Americanisations
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 0 comments
-
"Hey Ollama" (Home Assistant + Ollama)
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 19 comments
-
Ollama has merged in K/V cache quantisation support, halving the memory used by the context
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 139 comments
-
I modified that Qwen Code Artefacts demo on HF to use Ollama locally
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Ollama now runs inference concurrently by default
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 24 comments
-
RIP My 2x RTX 3090, RTX A1000, 10x WD Red Pro 10TB (Power Surge) ðŸ˜
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 138 comments
-
Anyone want to test my PR to enable quantised K/V cache in Ollama
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 17 comments
-
My jank 2x 3090, 1x a4000 setup
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 9 comments
-
It's been a while since DeepSeek released a new coder lite model...
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 9 comments
-
llama.cpp merges support for TriLMs and BitNet b1.58
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 10 comments
-
Open LLMs catching up to closed LLMs [coding/ELO] (Updated 10 July 2024)
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 173 comments
-
Ollama merges tooling support
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 13 comments
-
Ollama merges OpenAI compatible API endpoint for batch embeddings
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Plandex - AI driven development in the terminal
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 9 comments
-
I'm looking for a diagram that shows the rate of improvements of Open LLMs and how they've caught up to closed / API LLM providers
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 7 comments
-
Ollama adds /v1/models and /v1/completions OpenAI compatible APIs
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Ollama now runs inference in parallel by default
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 0 comments
-
If your DeepSeek Coder V2 is outputting Chinese - your template is probably wrong (as are the official Ollama templates)
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 14 comments
-
CUDA Graph support merged into llama.cpp (+5-18%~ performance on RTX3090/4090)
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 41 comments
-
The impact of flash_attention - comparing LM Studio (w/ FA) with Ollama (w/o FA)
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 22 comments
-
Gollama - An Ollama model manager (TUI)
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 6 comments
-
GGML Flash Attention support merged into llama.cpp
Posted by sammcj@reddit | LocalLLaMA | View on Reddit | 114 comments