How do you discover & choose right models for your agents? (genuinely curious)

Posted by Curious-Engineer22@reddit | LocalLLaMA | View on Reddit | 6 comments

I'm trying to understand how people actually find the right model for their use case.

If you've recently picked a model for a project, how did you do it?

A few specific questions: 1. Where did you start your search? (HF search, Reddit, benchmarks, etc.) 2. How long did it take? (minutes, hours, days?) 3. What factors mattered most? (accuracy, speed, size?) 4. Did you test multiple models or commit to one? 5. How confident were you in your choice?

Also curious: what would make this process easier?

My hypothesis is that most of us are winging it more than we'd like to admit. Would love to hear if others feel the same way or if I'm just doing it wrong!

[-]

Fit-Practice-9612@reddit

It’s largely a matter of experimenting. Many platforms let you run the same prompt across multiple models in parallel, so you can compare outcomes side by side. By tracking metrics like latency, cost, token usage, speed, and accuracy, you can evaluate the trade-offs and pick the model that best fits your needs. Hope that helps.

[-]

ShoddyAd9869@reddit

hey, i think it's more of a trial and find out thingie. Many platforms now provide a platforms where you can test out your prompt on various models and test them out at the same time. now, using various metrics such as latency, cost, tokens used, speed, accuracy etc, you can compare the results and decide on the model which best suits you. I hope this was helpful.

[-]

AstroZombie138@reddit

I'm interested as well, but personally I start with several large models not worrying about inference performance, see which one works best, and then I start going smaller on the quantization until I find the right balance. of performance vs. accuracy.

[-]

stuckinmotion@reddit

..and then I go back to qwen3, lol

[-]

SAPPHIR3ROS3@reddit

HF research based on reddit sentiment and popularity, it usually doesn’t take long to choose.Normally the only factor i take into consideration is instruction following (for its size) but sometimes i value the root language (e.g. Chinese for qwen) and the speed i can achieve. As for testing multiple models or sticking to one it depends, but normally i tend do stick with one and create different system prompts

[-]

SnooMarzipans2470@reddit

I have the same question, i think a lot of people here are seasoned experts so they know what are out there and they learn when a new one drops. Reading the top past threads from this sub has helped me get more familiar w the top of the box models out there, im still learning tho