How is llama.cpp or other implementations handle tokenization without tiktoken?

Posted by EricHermosis@reddit | LocalLLaMA | View on Reddit | 11 comments

Hi! I built my own tensor library in C++ and got llama3 working here, that means I created a simple server with sockets that can send and receive tensors from a python client, so I tokenize with tiktoken in the python client, send the tensor to my C++ transformer and get back the result.

I'm getting good results on llama3 1B, decent besides zero optimizations made yet, however I would like to get rid of python and make everything in C++. The problem is that tiktoken is rust/python. What do you think I should do? Try to implement it from scratch, look for someone else implementation? Try to use the original that is written in rust? How does llama.cpp or other implementations of llms handle this???