maddogawl

Phi-4 has been released

Posted by paf1138@reddit | LocalLLaMA | View on Reddit | 229 comments

[-]

Do you think that issue would also impact being able to run it in LM Studio with AMD hardware? I also can't get the model to load for the life of me. Tried with ROCm, Vulkan, and down to a super low context window, and it won't load. Q3, Q4, Q6, none of them load for me :/ Very vague error: (Exit code: 0). Some model operation failed. Try a different model and/or config.

Phi-4 has been released

Posted by paf1138@reddit | LocalLLaMA | View on Reddit | 229 comments

[-]

maddogawl@reddit

i appreciated it so much, it became one of my most used models for agent work locally.

Phi-4 has been released

Posted by paf1138@reddit | LocalLLaMA | View on Reddit | 229 comments

[-]

maddogawl@reddit

Oh wow, i'm glad I checked here, I couldn't for the life of me figure out why these weren't running.

DeepSeek V3 is the shit.

Posted by Odd-Environment-7193@reddit | LocalLLaMA | View on Reddit | 301 comments

[-]

maddogawl@reddit

yeah i've totally switched over to it now!

We deserve something better than LangChain

Posted by Available_Ad_5360@reddit | LocalLLaMA | View on Reddit | 68 comments

[-]

maddogawl@reddit

How do you feel about Autogen? I felt like it balances the black box feeling with being helpful really well.

Get multiple llms talking to each other?

Posted by Aggressive_Special25@reddit | LocalLLaMA | View on Reddit | 4 comments

[-]

maddogawl@reddit

I’m working on a better guide but here’s a video I did on having agents work together to build a book GitHub link is there as well. I’m currently working on 2 side project. 1. Having 2 LLMs play a game 2. Having a ton of agents build a game design document Should have videos out on those this week. https://youtu.be/EVrL6Qg7e9A

Get multiple llms talking to each other?

Posted by Aggressive_Special25@reddit | LocalLLaMA | View on Reddit | 4 comments

[-]

maddogawl@reddit

Python hitting your server ran from LMstudio

Deepseek-V3 GGUF's

Posted by fraschm98@reddit | LocalLLaMA | View on Reddit | 79 comments

[-]

maddogawl@reddit

I bet its like a small heater lol! Thats a really nice build you have. I was thinking about building a dedicated LLM machine with 4 - 8 of the Intel B580's but I need to get one first to see how it performs. That or get another 7900XTX and add it to my main computer.

Deepseek-V3 GGUF's

Posted by fraschm98@reddit | LocalLLaMA | View on Reddit | 79 comments

[-]

maddogawl@reddit

Jealous, my Motherboard only supports up to 256GB

LLM as survival knowledge base

Posted by NickNau@reddit | LocalLLaMA | View on Reddit | 152 comments

[-]

maddogawl@reddit

I was recently watching Silo on Apple TV, which got me to thinking about how we could store all of the worlds history without needing physical copies. I feel like LLMs are destined for that, we could send the entire Earths history to another planet in the future. Its really amazing to think about that. I'm really curious how close we are to that today. Could we take opensource DeepSeek V3 and have it give us detailed history lessons, and how accurate would it be? My mind is spinning lol

deepseek suks

Posted by RouteGuru@reddit | LocalLLaMA | View on Reddit | 11 comments

[-]

maddogawl@reddit

Curious what small coding errors you are seeing, what technology, language etc? I'm primarily working with Typescript and Python, and I've found it to be near perfect. Claude I usually have to tweak things a lot after it gives me code, where DeepSeek almost always seems to be good first try.

deepseek suks

Posted by RouteGuru@reddit | LocalLLaMA | View on Reddit | 11 comments

[-]

maddogawl@reddit

I’m running on their site. Used it for 10 hours of coding today and it was really good.

deepseek suks

Posted by RouteGuru@reddit | LocalLLaMA | View on Reddit | 11 comments

[-]

maddogawl@reddit

I’ve had the complete opposite experience, I feel like it does phenomenal at keeping context at least in terms of coding.

I used AI agents to see if I could write an entire book | AutoGen + Mistral-Nemo

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 30 comments

[-]

maddogawl@reddit (OP)

I love that character creator, one idea I had for writing a book was to create an agent for each character and have it write its own story from its point of view based on the situation around it. Seems ambitious to pull off, but could be so cool. I could see using character generators being used to fill out extras in the book, that have full storylines of its own. Thank you for sharing KoboldAI, I did not know that existed, checking it out more tonight!

I used AI agents to see if I could write an entire book | AutoGen + Mistral-Nemo

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 30 comments

[-]

maddogawl@reddit (OP)

I found around the bigger the better, I landed on around 32k min with Mistral-Nemo to get the best results, but I did get okay results at 1/2 that.

I used AI agents to see if I could write an entire book | AutoGen + Mistral-Nemo

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 30 comments

[-]

maddogawl@reddit (OP)

I agree, I do think it will get there, part of me wonders if a model could be fine tuned on writing styles and trained to respond not as an assistant but as a writer. Then you could use assistant models with the writer models. Totally theoretical, but I agree right now they are not good enough.

I used AI agents to see if I could write an entire book | AutoGen + Mistral-Nemo

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 30 comments

[-]

maddogawl@reddit (OP)

Here is the source code, I would love to see if anyone has feedback on this little side project I took on: [https://github.com/adamwlarson/ai-book-writer](https://github.com/adamwlarson/ai-book-writer) I really feel we are just scratching the surface of AI agents and their possibility.

µLocalGLaDOS - offline Personality Core

Posted by Reddactor@reddit | LocalLLaMA | View on Reddit | 136 comments

[-]

maddogawl@reddit

I'm impressed, gives me so many ideas on things I want to try now. Thank you for sharing this!

Radeon GPU for local LLM?

Posted by PetMogwai@reddit | LocalLLaMA | View on Reddit | 32 comments

[-]

maddogawl@reddit

Ahhh thank you for calling that out. I assumed it was an issue regardless of OS. Seems like I need to upgrade my Linus server.

Radeon GPU for local LLM?

Posted by PetMogwai@reddit | LocalLLaMA | View on Reddit | 32 comments

[-]

maddogawl@reddit

I have a single 7900xtx and would love to buy a 2nd, if you end up getting 2 please come back and tell me how it performs.

Radeon GPU for local LLM?

Posted by PetMogwai@reddit | LocalLLaMA | View on Reddit | 32 comments

[-]

maddogawl@reddit

How do you manage drivers when you have AMD and Nvidia both installed at the same time? I've always thought that could cause a ton of headaches

What's your primary local LLM at the end of 2024?

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 210 comments

[-]

maddogawl@reddit

I’m really enjoying phi-4 the unofficial release. It seems to be good or decent at everything I try, from coding to writing. QWQ is probably my next one

My Apple Intelligence Writing Tools for Windows/Linux/macOS app just had a huge new update. It supports a ton of local LLM implementations, and is open source & free :D. You can now chat with its one-click summaries of websites/YT videos/docs, and bring up an LLM chat UI anytime. Here's a new demo!

Posted by TechExpert2910@reddit | LocalLLaMA | View on Reddit | 38 comments

[-]

maddogawl@reddit

this is amazing, thank you for making this! Sent you a tip, hopefully you are able to keep working on this. As a developer myself I know how hard it is to find time to build things like this for the community.

How does a model like QwQ do calculations like 4692*2 „in its head“?

Posted by andWan@reddit | LocalLLaMA | View on Reddit | 33 comments

[-]

maddogawl@reddit

I noticed this seemed to be happening as well

Latest AMD Driver 24.12.1 performs significantly worse than 24.8.1 running QwQ.

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

maddogawl@reddit (OP)

Thank you for confirming you are seeing this too.

Thoughts on Intel B850 GPU with 12GB VRAM for $250?

Posted by Chemical_Elk7746@reddit | LocalLLaMA | View on Reddit | 70 comments

[-]

maddogawl@reddit

I wanted to buy one to test it out, but they were sold out everywhere.

Latest AMD Driver 24.12.1 performs significantly worse than 24.8.1 running QwQ.

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

maddogawl@reddit (OP)

https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-24-8-1.html You can find it towards the bottom

Latest AMD Driver 24.12.1 performs significantly worse than 24.8.1 running QwQ.

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

maddogawl@reddit (OP)

This is exactly what I’m seeing

Latest AMD Driver 24.12.1 performs significantly worse than 24.8.1 running QwQ.

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

maddogawl@reddit (OP)

Good point!

Open models wishlist

Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 238 comments

[-]

maddogawl@reddit

Thank you so much for reaching out to the community to ask. My main use case is coding, and i've found that Gemma 27b while a good model just isn't great for my use case. I'd love a model that has some additional reasoning capabilities that I could run locally. I find that my go to local model is QwQ at the moment which is incredible but very wordy. More context is always better for what i'm doing, but i can work around that most of the time.

Latest AMD Driver 24.12.1 performs significantly worse than 24.8.1 running QwQ.

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

maddogawl@reddit (OP)

Thats a good idea, i'm going to do that now.

Latest AMD Driver 24.12.1 performs significantly worse than 24.8.1 running QwQ.

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

maddogawl@reddit (OP)

I'm running the QwQ 32B param model, i'm wondering if its only the larger ones based on your feedback.

Latest AMD Driver 24.12.1 performs significantly worse than 24.8.1 running QwQ.

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

maddogawl@reddit (OP)

Is that with the most recent driver?

Latest AMD Driver 24.12.1 performs significantly worse than 24.8.1 running QwQ.

Posted by maddogawl@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

maddogawl@reddit (OP)

That note is what prompted me to try upgrading. So your 100% GPU load went away after loading the model?

Gemini 2.0 Flash Experimental, anyone tried it?

Posted by robberviet@reddit | LocalLLaMA | View on Reddit | 67 comments

[-]

maddogawl@reddit

I'm using Sonnet 3.5, putting together some larger tests at the moment, and its really blowing my mind how much its competing with 3.5 for my use cases. I primarily use it for coding, a mix of data science ML model building, data cleaning, feature engineering, as well as backend and frontend code using Vue.js and Typescript.

Gemini 2.0 Flash beating Claude Sonnet 3.5 on SWE-Bench was not on my bingo card

Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 169 comments

[-]

maddogawl@reddit

Today it was amazing using Gemini 2.0 Flash, my only gripe is that I hit moments where responses were erroring out, or taking 300+ seconds. I have a feeling this is a scaling issue since it just released. It really crushed code for me today.

Gemini 2.0 Flash Experimental, anyone tried it?

Posted by robberviet@reddit | LocalLLaMA | View on Reddit | 67 comments

[-]

maddogawl@reddit

I've been trying it out, doing side by side comparisons with Claude, QWQ for a specific data science problem where I want to create a model that generates a propensity score. This is a very narrow use case, but what I found was the following. Pros: 1. The response time is incredibly fast 2. The quality is on par with Claude for the first response, this is using identical setup and prompts. 3. Both initial versions were very flawed. Cons: 1. Fixing errors in 2.5, pasting Python error leads to a new version of the code that wasn't fixed. I gave it 5 attempts, and the problem wasn't resolved. In Claude it had similar issues that were resolved after 3 attempts. Mixed: 1. The model each generated were fine, but what I liked about Googles was how it attempted to test multiple models against each other, where Claude just picked one. 2. The final quality of the model is still up in the air, but the features generated by the Google model were much more basic, where Claude put together some much more complex features. I eventually hit a point with Google's where it quit giving me responses, i'm assuming they are hitting demand limits.

Llama 3.3 on a 4090 - quick feedback

Posted by latentmag@reddit | LocalLLaMA | View on Reddit | 105 comments

[-]