How to benchmark local LLM

Posted by badabimbadabum2@reddit | LocalLLaMA | View on Reddit | 1 comments

planning to put AI on my website so that users can either ChAT, summarize content or do smt else. To find suitable GPU, I want to know how many users one GPU could serve simultaneously. So for example if same time 20 users asks a question from the local LLM, how fast a 4090 can serve the output to the users. So is there a test which could simulate user demand like websites tests? I might be able to do smt with locust but it would need some work. Anyone knows?