SomeOddCodeGuy

A little present for y'all: An easy to use offline API that serves up full text Wikipedia articles. Start it up, send in a query/prompt to the endpoint, get back a matching full wiki article to RAG against.

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 38 comments
Running Deepseek R1 0528 q4_K_M and mlx 4-bit on a Mac Studio M3

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 30 comments
WilmerAI: I just uploaded around 3 hours worth of video tutorials explaining the prompt routing, workflows, and walking through running it

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 26 comments
Try not to forget what Open Source AI is best at, and you'll enjoy it so much more

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 73 comments
Theory: trying to use newer and more powerful LLMs to sound more human is likely moving in the wrong direction

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 73 comments
My personal guide for developing software with AI assistance

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 55 comments
M3 Ultra Mac Studio 512GB prompt and write speeds for Deepseek V3 671b gguf q4_K_M, for those curious

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 110 comments
Please prove me wrong. Lets properly discuss Mac setups and inference speeds

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 6 comments
Mac Speed Comparison: M2 Ultra vs M3 Ultra using KoboldCpp

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 102 comments
Don't underestimate the power of RAG

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 40 comments
Sharing my unorthodox home setup, and how I use local LLMs

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 40 comments
What all front ends exist for connecting to LLM APIs?

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 2 comments
Qwen2 72b VL is actually really impressive. It's not perfect, but for a local model I'm certainly impressed (more info in comments)

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 43 comments
I've realized that Llama 4's odd architecture makes it perfect for my Mac and my workflows

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 72 comments
I really like the style of how QwQ represents code architecture. I haven't seen one draw it out like this.

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 29 comments
My personal guide for developing software with AI Assistance: Part 2

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 19 comments
I've uploaded new Wilmer users, and made another tutorial vid showing setup plus ollama hotswapping multiple 14b models on a single RTX 4090

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 1 comments
Deepseek 67b is amazing, and in at least 1 usecase it seems better than ChatGPT 4

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 5 comments
What is the current best Mistral 7b v0.3 finetune?

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 13 comments
The distilled R1 models likely work best in workflows, so now's a great time to learn those if you haven't already!

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 14 comments
Almost a year later, I can finally do this. A small teaser of a project I'm working on

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 120 comments
Tools to route requests to different LLMs based on topic?

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 2 comments
I used QwQ as a conversational thinker, and accidentally simulated awkward overthinking

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 33 comments
Here Are Some Real World Speeds For the Mac M2 Ultra, In Case You Were Curious

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 22 comments
OpenAI documentation showing a change from System role to Developer role

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 0 comments
Wilmer update after 5 months: the workflow based prompt router that supports rolling 'memories'

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 12 comments
MMLU-Pro all category test results for Llama 3 70b Instruct ggufs: q2_K_XXS, q2_K, q4_K_M, q5_K_M, q6_K, and q8_0

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 50 comments
Low Context Speed Comparison: Macbook, Mac Studios, and RTX 4090

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 37 comments
As someone who is passionate about workflows in LLMs, I'm finding it hard to trust o1's outputs

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 11 comments
I've realized that I honestly don't know WHAT the Mac Studio's bottleneck is...

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 3 comments
Sorry for the wait folks. Meet WilmerAI- my open source project to maximize the potential of Local LLMs via prompt routing and multi-model workflow management

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 39 comments
It looks like IBM just updated their 20b coding model

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 13 comments
Let's talk about API privacy and cost- what are some good ones?

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 20 comments
PSA: Gemma 27b ggufs can be pretty sensitive to blast batch size changes

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 7 comments
MMLU-Pro Combined Results- Including New Results for L3 8b SPPO, Hermes 2 Theta L3 8b, and Some Golden Oldies Like Dolphin 2.5 Mixtral, Nous Capybara 34b and WizardLM-2-7b

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 39 comments
MMLU-Pro Fun Part 2: Surprising result comparing Llama 3 70b using the right and wrong prompt templates. Bonus- quick speed comparison across H100, 4090s, Mac Studio and Macbook.

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 25 comments
A quick peek on the affect of quantization on Llama 3 8b and WizardLM 8x22b via 1 category of MMLU-Pro testing

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 51 comments
PSA: Just loading a gguf with higher context can negatively affect output, even with low context inputs

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 18 comments
Quick Start Guide To Converting Your Own GGUFs (including fp16)

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 1 comments
Ok, I admit- SillyTavern is a great way to test models after all

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 8 comments
Real World Speeds on the Mac: Koboldcpp Context Shift Edition!

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 2 comments
Real World Speeds on the Mac: We got a bump with new Llama.cpp/Koboldcpp

Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 16 comments