Blog: AI evals are becoming the new compute bottleneck
Posted by evijit@reddit | LocalLLaMA | View on Reddit | 6 comments
Hi! I wanted to share my new blog on the costs of running AI Evals. We dig into how benchmarking frontier systems now routinely costs tens of thousands of dollars per run, why agent evals are especially unpredictable, and what that concentration of validation authority means for the broader research community.
9gxa05s8fa8sh@reddit
I love AI research, the studies and benchmarks are awesome, and the best stuff is not popular yet
iMakeSense@reddit
What other best stuff isn't popular?
9gxa05s8fa8sh@reddit
I recently enjoyed the mapcoder-lite study: https://www.reddit.com/r/LocalLLaMA/comments/1symfop/study_2x_coding_performance_of_7b_model_without/
but 500 studies go up on arxiv a day, so you can have at it:
https://arxiv.org/list/cs/recent?skip=0&show=50
abnormal_human@reddit
Evals are brutal, and honestly one of the best arguments for local AI today since they represent a full utilization, parallel task that can saturate a workstation while also doing valuable work.
lorddumpy@reddit
Just a cool $30,000+ lol
abnormal_human@reddit
Business. Expense. :)