An explainer blog on attention, KV-caching, continuous batching
Posted by unofficialmerve@reddit | LocalLLaMA | View on Reddit | 1 comments
Processing img d3mc1kovxk3g1...
Hey folks, it's Merve from Hugging Face!
Yesterday we dropped a lengthy blog, illustrating cutting edge inference optimization techniques: continuous batching, KV-caching and more (also attention and everything that let to them to be beginner-friendly)! We hope you like it 🤗
unofficialmerve@reddit (OP)
we have plans to drop more blogs, let us know about the concepts you're curious about!
here it is https://huggingface.co/blog/continuous_batching