Small and fast function calling models
Posted by Jumper775-2@reddit | LocalLLaMA | View on Reddit | 43 comments
I'm currently writing a tool which requires as close to real time responses as possible with function calling. The output quality doesn't matter a whole lot, it just needs to be able to guide the user through choosing an option from options it gets after after a function call. I'm currently trying to use a tinyllama finetune on my 7900x using llama-cpp-python but it takes a minute and a half to get a response. This is too long, maximum I can realistically wait is 10 seconds.
What do yall recommend?
43 Comments
Omnic19@reddit
Jumper775-2@reddit (OP)
Omnic19@reddit
Jumper775-2@reddit (OP)
Omnic19@reddit
Jumper775-2@reddit (OP)
Omnic19@reddit
Jumper775-2@reddit (OP)
Omnic19@reddit
Jumper775-2@reddit (OP)
Omnic19@reddit
Jumper775-2@reddit (OP)
Omnic19@reddit
Jumper775-2@reddit (OP)
Omnic19@reddit
Jumper775-2@reddit (OP)
Omnic19@reddit
Jumper775-2@reddit (OP)
Omnic19@reddit
Jumper775-2@reddit (OP)
Paulonemillionand3@reddit
Jumper775-2@reddit (OP)
Paulonemillionand3@reddit
Jumper775-2@reddit (OP)
ramzeez88@reddit
Jumper775-2@reddit (OP)
Pedalnomica@reddit
Jumper775-2@reddit (OP)
jackshec@reddit
Jumper775-2@reddit (OP)
phree_radical@reddit
Jumper775-2@reddit (OP)
phree_radical@reddit
Jumper775-2@reddit (OP)
phree_radical@reddit
Jumper775-2@reddit (OP)
ramzeez88@reddit
Jumper775-2@reddit (OP)
KaiwenKHB@reddit
Jumper775-2@reddit (OP)
remyxai@reddit
vasileer@reddit
Jumper775-2@reddit (OP)