SomeOddCodeGuy
-
A little present for y'all: An easy to use offline API that serves up full text Wikipedia articles. Start it up, send in a query/prompt to the endpoint, get back a matching full wiki article to RAG against.
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 38 comments
-
Running Deepseek R1 0528 q4_K_M and mlx 4-bit on a Mac Studio M3
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 30 comments
-
WilmerAI: I just uploaded around 3 hours worth of video tutorials explaining the prompt routing, workflows, and walking through running it
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 26 comments
-
Try not to forget what Open Source AI is best at, and you'll enjoy it so much more
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 73 comments
-
Theory: trying to use newer and more powerful LLMs to sound more human is likely moving in the wrong direction
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 73 comments
-
My personal guide for developing software with AI assistance
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 55 comments
-
M3 Ultra Mac Studio 512GB prompt and write speeds for Deepseek V3 671b gguf q4_K_M, for those curious
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 110 comments
-
Please prove me wrong. Lets properly discuss Mac setups and inference speeds
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Mac Speed Comparison: M2 Ultra vs M3 Ultra using KoboldCpp
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 102 comments
-
Don't underestimate the power of RAG
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 40 comments
-
Sharing my unorthodox home setup, and how I use local LLMs
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 40 comments
-
What all front ends exist for connecting to LLM APIs?
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Qwen2 72b VL is actually really impressive. It's not perfect, but for a local model I'm certainly impressed (more info in comments)
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 43 comments
-
I've realized that Llama 4's odd architecture makes it perfect for my Mac and my workflows
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 72 comments
-
I really like the style of how QwQ represents code architecture. I haven't seen one draw it out like this.
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 29 comments
-
My personal guide for developing software with AI Assistance: Part 2
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 19 comments
-
I've uploaded new Wilmer users, and made another tutorial vid showing setup plus ollama hotswapping multiple 14b models on a single RTX 4090
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Deepseek 67b is amazing, and in at least 1 usecase it seems better than ChatGPT 4
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 5 comments
-
What is the current best Mistral 7b v0.3 finetune?
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 13 comments
-
The distilled R1 models likely work best in workflows, so now's a great time to learn those if you haven't already!
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 14 comments
-
Almost a year later, I can finally do this. A small teaser of a project I'm working on
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 120 comments
-
Tools to route requests to different LLMs based on topic?
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 2 comments
-
I used QwQ as a conversational thinker, and accidentally simulated awkward overthinking
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 33 comments
-
Here Are Some Real World Speeds For the Mac M2 Ultra, In Case You Were Curious
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 22 comments
-
OpenAI documentation showing a change from System role to Developer role
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Wilmer update after 5 months: the workflow based prompt router that supports rolling 'memories'
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 12 comments
-
MMLU-Pro all category test results for Llama 3 70b Instruct ggufs: q2_K_XXS, q2_K, q4_K_M, q5_K_M, q6_K, and q8_0
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 50 comments
-
Low Context Speed Comparison: Macbook, Mac Studios, and RTX 4090
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 37 comments
-
As someone who is passionate about workflows in LLMs, I'm finding it hard to trust o1's outputs
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 11 comments
-
I've realized that I honestly don't know WHAT the Mac Studio's bottleneck is...
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Sorry for the wait folks. Meet WilmerAI- my open source project to maximize the potential of Local LLMs via prompt routing and multi-model workflow management
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 39 comments
-
It looks like IBM just updated their 20b coding model
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 13 comments
-
Let's talk about API privacy and cost- what are some good ones?
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 20 comments
-
PSA: Gemma 27b ggufs can be pretty sensitive to blast batch size changes
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 7 comments
-
MMLU-Pro Combined Results- Including New Results for L3 8b SPPO, Hermes 2 Theta L3 8b, and Some Golden Oldies Like Dolphin 2.5 Mixtral, Nous Capybara 34b and WizardLM-2-7b
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 39 comments
-
MMLU-Pro Fun Part 2: Surprising result comparing Llama 3 70b using the right and wrong prompt templates. Bonus- quick speed comparison across H100, 4090s, Mac Studio and Macbook.
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 25 comments
-
A quick peek on the affect of quantization on Llama 3 8b and WizardLM 8x22b via 1 category of MMLU-Pro testing
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 51 comments
-
PSA: Just loading a gguf with higher context can negatively affect output, even with low context inputs
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 18 comments
-
Quick Start Guide To Converting Your Own GGUFs (including fp16)
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Ok, I admit- SillyTavern is a great way to test models after all
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 8 comments
-
Real World Speeds on the Mac: Koboldcpp Context Shift Edition!
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Real World Speeds on the Mac: We got a bump with new Llama.cpp/Koboldcpp
Posted by SomeOddCodeGuy@reddit | LocalLLaMA | View on Reddit | 16 comments