davidpfarrell

Is anyone talking verbally to their models and have them talking back through TTS?

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 17 comments

Qwen3 local 14B Q4_K_M or 30B A3B Q2_K_L who has higher quality

Posted by Consistent_Winner596@reddit | LocalLLaMA | View on Reddit | 41 comments

davidpfarrell@reddit

OK so my early tests are not great ... My setup: \* 16" Macbook Pro M4 Max 48GB Ram, 16 CPU Cores, 48 GPU Cores Models Tested: \* qwen3-30b-a3b-thinking-2507 \* qwen3-coder-30b-a3b-instruct Possibly-important settings that I have (no both): \* Offload KV Cache to GPU \* K cache quantized to Q8\_0 Both of these models saw significantly reduced tokens/s when expert-offload was enabled. For Coder, it was about a 50% drop, but for Instruct was about a 66% drop :( For now I'll leave it disabled for these two, but I'll continue to test as new versions of LMS are released and as I download newer models that might benefit ...

Qwen3 local 14B Q4_K_M or 30B A3B Q2_K_L who has higher quality

Posted by Consistent_Winner596@reddit | LocalLLaMA | View on Reddit | 41 comments

davidpfarrell@reddit

No worries about the late reply - I think that's a 'feature' of Reddit that I use all the time - Either asking a follow up or offering an answer ... I'm glad you reminded me about this - Turns out the feature was recently added to LMS ! [https://lmstudio.ai/blog/lmstudio-v0.3.23#force-moe-expert-weights-onto-cpu-or-gpu](https://lmstudio.ai/blog/lmstudio-v0.3.23#force-moe-expert-weights-onto-cpu-or-gpu) Since I also just learned about it, I haven't it yet, but I'm about to today! Chime back in with your results if you think of it.

Huh

Posted by Own-Potential-2308@reddit | LocalLLaMA | View on Reddit | 8 comments

Anyone have a CD key for this

Posted by Financial-Cookie-927@reddit | vintagecomputing | View on Reddit | 46 comments

Just found my Chant CD by the Benectine Monks #GenX

Posted by OkDragonfruit2016@reddit | GenX | View on Reddit | 45 comments

support for EXAONE 4.0 model architecture has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 34 comments

davidpfarrell@reddit

And it worked! *Prompt*: >sudo make me a sandwich *exaone-4.0-32b*: >Here’s your **perfect sandwich**—no actual command line required! 😄 # 🥪 Command-Line Sandwich Recipe sudo apt-get install bread cheese lettuce tomato mayo mustard --assume-yes && \ mkdir -p /tmp/sandwich && \right/*.ingredients | sort >> /tmp/sandwich & \ systemctl start sandwich_maker --no-ping || echo "Manual assembly required" *(Output: A delicious, slightly toasted sandwich with your favorite toppings.)* # 🔥 Bonus Hot Sauce Method (for the adventurous): echo "🌶️ Spicy Mode Activated!" && \ pip install sriracha && \ apply --with-fire = yes Let me know if you'd like a *vegan*, *gluten-free*, or *overly complex* version. 🚀 \--- 9.84 tok/sec • 207 tokens • 1.07s to first token Stop reason: EOS Token Found

support for EXAONE 4.0 model architecture has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 34 comments

davidpfarrell@reddit

Okay LM Studio, ball is in your court for - I've had it downloaded for a few days now just for it to be supported so I can load it ... OH SNAP! I decided to look real quick before clicking submit to see if the LMS Ollama runtime was updated, and it is! I'm off to try to load it ... Note: There's an error in the original chat template - I can post an update if its still broked for people

LM Studio now supports MCP!

Posted by No_Conversation9561@reddit | LocalLLaMA | View on Reddit | 62 comments

Everything you wanted to know about Apple’s MLX

Posted by Careless_Garlic1438@reddit | LocalLLaMA | View on Reddit | 53 comments

Get Claude at Home - New UI generation model for Components and Tailwind with 32B, 14B, 8B, 4B

Posted by United-Rush4073@reddit | LocalLLaMA | View on Reddit | 63 comments

Get Claude at Home - New UI generation model for Components and Tailwind with 32B, 14B, 8B, 4B

Posted by United-Rush4073@reddit | LocalLLaMA | View on Reddit | 63 comments

davidpfarrell@reddit

32B has landed! But I'm ascared to grab the quants with the statement that they seem to be underperforming ... Going to have to wait it out and see what Unsloth/others might do or if updated quants are released. Just the same, thanks for sharing these and I look forward to trying them soon! RemindMe! 10 days

Get Claude at Home - New UI generation model for Components and Tailwind with 32B, 14B, 8B, 4B

Posted by United-Rush4073@reddit | LocalLLaMA | View on Reddit | 63 comments

davidpfarrell@reddit

Played with \`Tesslate/UIGEN-T2-7B-Q8\_0-GGUF\` previously so I'm glad to see continued work in this direction. Thanks for sharing and keep up the good work!

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

davidpfarrell@reddit

Thanks for the update I checked out several older episodes ! Q: What determines when an episode ends? Doesn't seem to be based on: \* Turns \* Stars \* Deaths \* Points \* Time Played \* Achieving anything specific in the game Are you perhaps starting new episodes when you push changes to the code base?

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

davidpfarrell@reddit

$0.50/day is much better ! If you dive into a reply function, please do consider a forward/back action that reveals the next step taken etc ... especially if the map connections / you-are-here is updating in " real time" as the story unfolds?

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

davidpfarrell@reddit

Thanks for the note ... That is simply too slow for my entertainment :) (although I do still have the site open and check it every hour or so) ... But I wonder: Can I open and replay a previous episode? Being able to go forward/back at will and see the episode play out would be awesome!

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

davidpfarrell@reddit

Congrats this looks amazing! But I'm unsure how to interact with the live play site? i.e. do I have to hit reload to see the next action? Is there a built-in delay between movements? It doesn't look like there's any "thinking" going on ... I did manage to see one movement while visiting the site and doing a bunch of reloads (current episode turn 34-to-35) ...

You probably don't need a DI framework

Posted by FoxInTheRedBox@reddit | programming | View on Reddit | 296 comments

Qwen3 30B A3B unsloth GGUF vs MLX generation speed difference

Posted by ahmetegesel@reddit | LocalLLaMA | View on Reddit | 26 comments

davidpfarrell@reddit

Hey OP thanks for sharing - I think I may have the 8\_K\_XL downloaded too, going to check now ... Q: Exactly which MLX model are you using? i.e got link to HF Card? Or did you make your own mlx by conversion of another model?

Qwen3 local 14B Q4_K_M or 30B A3B Q2_K_L who has higher quality

Posted by Consistent_Winner596@reddit | LocalLLaMA | View on Reddit | 41 comments

[Benchmark] Quick‑and‑dirty test of 5 models on a Mac Studio M3 Ultra 512 GB (LM Studio) – Qwen3 runs away with it

Posted by Turbulent_Pin7635@reddit | LocalLLaMA | View on Reddit | 85 comments

davidpfarrell@reddit

Thanks for this! One thing I'd like to know is how you compare the MLX vs GGFU performance (for same models)? I've been prioritizing MLX downloads for LM Studio (M4 Max 48GB sysctl to 40GB VRAM), but now wondering if focusing on GGFU's with aggressive dynamic quants might be a better way to go. Interested in your thoughts?

I made a fake phone to text fake people with llamacpp

Posted by thebadslime@reddit | LocalLLaMA | View on Reddit | 8 comments

7B UI Model that does charts and interactive elements

Posted by United-Rush4073@reddit | LocalLLaMA | View on Reddit | 34 comments

davidpfarrell@reddit

Looking forward to trying this out. BTW: DevQuasar has quanted this out to various sizes based on the BF16 model: [https://huggingface.co/DevQuasar/Tesslate.UIGEN-T2-7B-GGUF](https://huggingface.co/DevQuasar/Tesslate.UIGEN-T2-7B-GGUF) Thanks for sharing!

Drive-By Note on Cogito [ mlx - qwen - 32B - 8bit ]

Posted by davidpfarrell@reddit | LocalLLaMA | View on Reddit | 7 comments

davidpfarrell@reddit (OP)

I’ll admit i’m not throwing any hardballs at these but for what im doing i feel it is on par with qwq-32b. But the difference in cutoff dates (cogito being oct 2023) has been an issue on a couple of occasions.

Agentic QwQ-32B perfect bouncing balls

Posted by Specific-Rub-7250@reddit | LocalLLaMA | View on Reddit | 17 comments

davidpfarrell@reddit

Had the same question so went on a hunt. Found the model name in OP's source code: `QwQ-32B-AWQ` Which led me to the HF page for the model: \* [https://huggingface.co/Qwen/QwQ-32B-AWQ](https://huggingface.co/Qwen/QwQ-32B-AWQ) The feature list has only 1 difference from the original [QWQ-32B](https://huggingface.co/Qwen/QwQ-32B) page: >Quantization: AWQ 4-bit It seems to have been released the same day ... Being rather new I thought maybe the \`AWQ\` suffix was hinting at an Agentic tweak, but no it just appears to be adaptive quant technique: [Activation-Aware Weight Quantization (AWQ)](https://huggingface.co/docs/transformers/en/quantization/awq) So best I can tell OP is impressed how well this \~4-bit model performs in agentic tasks. Likely an indicator for the effectiveness of the AWQ technique.

The 2008 EEEPC 900 was the first 9 inch netbook

Posted by albertserene@reddit | vintagecomputing | View on Reddit | 106 comments