Which LLM (or SLM?) model can I use as a benchmark to target resource constrained edge devices? (INT8 quantised 100M-200M parameters)

Posted by neuroticnetworks1250@reddit | LocalLLaMA | View on Reddit | 7 comments

I am currently building up on an open source repo with a riscv controller and a vector unit and has incorporated a tightly coupled matrix unit as well. I might also try to add a dedicated Softmax unit if RVV instructions for Softmax becomes a bottleneck. Is there a list of models on hugging face perhaps that we can use (associated papers would be good) as benchmarking options?