davidpfarrell

Is anyone talking verbally to their models and have them talking back through TTS?

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 17 comments

[-]

davidpfarrell@reddit

Do you use an instruction to help the model keeps its replies short and more tts-friendly?

Qwen3 local 14B Q4_K_M or 30B A3B Q2_K_L who has higher quality

Posted by Consistent_Winner596@reddit | LocalLLaMA | View on Reddit | 41 comments

[-]

OK so my early tests are not great ... My setup: \* 16" Macbook Pro M4 Max 48GB Ram, 16 CPU Cores, 48 GPU Cores Models Tested: \* qwen3-30b-a3b-thinking-2507 \* qwen3-coder-30b-a3b-instruct Possibly-important settings that I have (no both): \* Offload KV Cache to GPU \* K cache quantized to Q8\_0 Both of these models saw significantly reduced tokens/s when expert-offload was enabled. For Coder, it was about a 50% drop, but for Instruct was about a 66% drop :( For now I'll leave it disabled for these two, but I'll continue to test as new versions of LMS are released and as I download newer models that might benefit ...

Qwen3 local 14B Q4_K_M or 30B A3B Q2_K_L who has higher quality

Posted by Consistent_Winner596@reddit | LocalLLaMA | View on Reddit | 41 comments

[-]

davidpfarrell@reddit

No worries about the late reply - I think that's a 'feature' of Reddit that I use all the time - Either asking a follow up or offering an answer ... I'm glad you reminded me about this - Turns out the feature was recently added to LMS ! [https://lmstudio.ai/blog/lmstudio-v0.3.23#force-moe-expert-weights-onto-cpu-or-gpu](https://lmstudio.ai/blog/lmstudio-v0.3.23#force-moe-expert-weights-onto-cpu-or-gpu) Since I also just learned about it, I haven't it yet, but I'm about to today! Chime back in with your results if you think of it.

Huh

Posted by Own-Potential-2308@reddit | LocalLLaMA | View on Reddit | 8 comments

[-]

davidpfarrell@reddit

Flashback to jr engineer asking honest question "Wouldn't it be easier to just let the browser send the full database query in the url" ?

[-]

davidpfarrell@reddit

For giggles I tried to see if could coax any of my ai agents into spitting one out, but my prompt fu wasn't good enough :)

Just found my Chant CD by the Benectine Monks #GenX

Posted by OkDragonfruit2016@reddit | GenX | View on Reddit | 45 comments

[-]

davidpfarrell@reddit

LOL that was a close one you almost lost it! I know exactly where mine is at this very moment :)

support for EXAONE 4.0 model architecture has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 34 comments

[-]

davidpfarrell@reddit

And it worked! *Prompt*: >sudo make me a sandwich *exaone-4.0-32b*: >Here’s your **perfect sandwich**—no actual command line required! 😄 # 🥪 Command-Line Sandwich Recipe sudo apt-get install bread cheese lettuce tomato mayo mustard --assume-yes && \ mkdir -p /tmp/sandwich && \right/*.ingredients | sort >> /tmp/sandwich & \ systemctl start sandwich_maker --no-ping || echo "Manual assembly required" *(Output: A delicious, slightly toasted sandwich with your favorite toppings.)* # 🔥 Bonus Hot Sauce Method (for the adventurous): echo "🌶️ Spicy Mode Activated!" && \ pip install sriracha && \ apply --with-fire = yes Let me know if you'd like a *vegan*, *gluten-free*, or *overly complex* version. 🚀 \--- 9.84 tok/sec • 207 tokens • 1.07s to first token Stop reason: EOS Token Found

support for EXAONE 4.0 model architecture has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 34 comments

[-]

davidpfarrell@reddit

Okay LM Studio, ball is in your court for - I've had it downloaded for a few days now just for it to be supported so I can load it ... OH SNAP! I decided to look real quick before clicking submit to see if the LMS Ollama runtime was updated, and it is! I'm off to try to load it ... Note: There's an error in the original chat template - I can post an update if its still broked for people

LM Studio now supports MCP!

Posted by No_Conversation9561@reddit | LocalLLaMA | View on Reddit | 62 comments

[-]

davidpfarrell@reddit

I've been seeing mention of in the beta updates but couldn't find it in the settings ... Totally stoked to check this out!

Everything you wanted to know about Apple’s MLX

Posted by Careless_Garlic1438@reddit | LocalLLaMA | View on Reddit | 53 comments

[-]

davidpfarrell@reddit

Checked the repo - Not a single screenshot of the app or even the cli running?

Get Claude at Home - New UI generation model for Components and Tailwind with 32B, 14B, 8B, 4B

Posted by United-Rush4073@reddit | LocalLLaMA | View on Reddit | 63 comments

[-]

davidpfarrell@reddit

RemindMe! 10 days

Get Claude at Home - New UI generation model for Components and Tailwind with 32B, 14B, 8B, 4B

Posted by United-Rush4073@reddit | LocalLLaMA | View on Reddit | 63 comments

[-]

davidpfarrell@reddit

32B has landed! But I'm ascared to grab the quants with the statement that they seem to be underperforming ... Going to have to wait it out and see what Unsloth/others might do or if updated quants are released. Just the same, thanks for sharing these and I look forward to trying them soon! RemindMe! 10 days

Get Claude at Home - New UI generation model for Components and Tailwind with 32B, 14B, 8B, 4B

Posted by United-Rush4073@reddit | LocalLLaMA | View on Reddit | 63 comments

[-]

davidpfarrell@reddit

Played with \`Tesslate/UIGEN-T2-7B-Q8\_0-GGUF\` previously so I'm glad to see continued work in this direction. Thanks for sharing and keep up the good work!

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

[-]

davidpfarrell@reddit

Thanks for the update I checked out several older episodes ! Q: What determines when an episode ends? Doesn't seem to be based on: \* Turns \* Stars \* Deaths \* Points \* Time Played \* Achieving anything specific in the game Are you perhaps starting new episodes when you push changes to the code base?

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

[-]

davidpfarrell@reddit

$0.50/day is much better ! If you dive into a reply function, please do consider a forward/back action that reveals the next step taken etc ... especially if the map connections / you-are-here is updating in " real time" as the story unfolds?

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

[-]

davidpfarrell@reddit

Thanks for the note ... That is simply too slow for my entertainment :) (although I do still have the site open and check it every hour or so) ... But I wonder: Can I open and replay a previous episode? Being able to go forward/back at will and see the episode play out would be awesome!

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

[-]

davidpfarrell@reddit

Dude you missed the opportunity to name it "**Zorkestrator**" :)

ZorkGPT: Open source AI agent that plays the classic text adventure game Zork

Posted by stickystyle@reddit | LocalLLaMA | View on Reddit | 81 comments

[-]

davidpfarrell@reddit

Congrats this looks amazing! But I'm unsure how to interact with the live play site? i.e. do I have to hit reload to see the next action? Is there a built-in delay between movements? It doesn't look like there's any "thinking" going on ... I did manage to see one movement while visiting the site and doing a bunch of reloads (current episode turn 34-to-35) ...

You probably don't need a DI framework

Posted by FoxInTheRedBox@reddit | programming | View on Reddit | 296 comments

[-]

davidpfarrell@reddit

I’ll totally give SpiteDI a try when it releases!

Qwen3 30B A3B unsloth GGUF vs MLX generation speed difference

Posted by ahmetegesel@reddit | LocalLLaMA | View on Reddit | 26 comments

[-]

davidpfarrell@reddit

Hey OP thanks for sharing - I think I may have the 8\_K\_XL downloaded too, going to check now ... Q: Exactly which MLX model are you using? i.e got link to HF Card? Or did you make your own mlx by conversion of another model?

Qwen3 local 14B Q4_K_M or 30B A3B Q2_K_L who has higher quality

Posted by Consistent_Winner596@reddit | LocalLLaMA | View on Reddit | 41 comments

[-]

davidpfarrell@reddit

Sounds like a good idea - I'm in LM Studio - Is there a specific config/comb that ensures the experts will be offloaded to CPU?

[Benchmark] Quick‑and‑dirty test of 5 models on a Mac Studio M3 Ultra 512 GB (LM Studio) – Qwen3 runs away with it

Posted by Turbulent_Pin7635@reddit | LocalLLaMA | View on Reddit | 85 comments

[-]

davidpfarrell@reddit

Thanks for this! One thing I'd like to know is how you compare the MLX vs GGFU performance (for same models)? I've been prioritizing MLX downloads for LM Studio (M4 Max 48GB sysctl to 40GB VRAM), but now wondering if focusing on GGFU's with aggressive dynamic quants might be a better way to go. Interested in your thoughts?

I made a fake phone to text fake people with llamacpp

Posted by thebadslime@reddit | LocalLLaMA | View on Reddit | 8 comments

[-]

davidpfarrell@reddit

Hmmm, maybe a version where kids can text Santa, the Easter Bunny, the Tooth Fairy ...

7B UI Model that does charts and interactive elements

Posted by United-Rush4073@reddit | LocalLLaMA | View on Reddit | 34 comments

[-]

davidpfarrell@reddit

Looking forward to trying this out. BTW: DevQuasar has quanted this out to various sizes based on the BF16 model: [https://huggingface.co/DevQuasar/Tesslate.UIGEN-T2-7B-GGUF](https://huggingface.co/DevQuasar/Tesslate.UIGEN-T2-7B-GGUF) Thanks for sharing!

Drive-By Note on Cogito [ mlx - qwen - 32B - 8bit ]

Posted by davidpfarrell@reddit | LocalLLaMA | View on Reddit | 7 comments

[-]

davidpfarrell@reddit (OP)

I’ll admit i’m not throwing any hardballs at these but for what im doing i feel it is on par with qwq-32b. But the difference in cutoff dates (cogito being oct 2023) has been an issue on a couple of occasions.

Agentic QwQ-32B perfect bouncing balls

Posted by Specific-Rub-7250@reddit | LocalLLaMA | View on Reddit | 17 comments

[-]

davidpfarrell@reddit

Had the same question so went on a hunt. Found the model name in OP's source code: `QwQ-32B-AWQ` Which led me to the HF page for the model: \* [https://huggingface.co/Qwen/QwQ-32B-AWQ](https://huggingface.co/Qwen/QwQ-32B-AWQ) The feature list has only 1 difference from the original [QWQ-32B](https://huggingface.co/Qwen/QwQ-32B) page: >Quantization: AWQ 4-bit It seems to have been released the same day ... Being rather new I thought maybe the \`AWQ\` suffix was hinting at an Agentic tweak, but no it just appears to be adaptive quant technique: [Activation-Aware Weight Quantization (AWQ)](https://huggingface.co/docs/transformers/en/quantization/awq) So best I can tell OP is impressed how well this \~4-bit model performs in agentic tasks. Likely an indicator for the effectiveness of the AWQ technique.

The 2008 EEEPC 900 was the first 9 inch netbook

Posted by albertserene@reddit | vintagecomputing | View on Reddit | 106 comments

[-]

davidpfarrell@reddit

How fun to see this again! We bought it for my MIL and she loved it! I actually bought on OLPC around the same time as well.