If you limit context to 4k tokens, which models today beat Llama2-70B from 2 years ago?

Posted by EmPips@reddit | LocalLLaMA | View on Reddit | 20 comments

Obviously this is a silly question. 4k context is limiting to the point where even dumber models are "better" for almost any pipeline and use case.

But for those who have been running local LLMs since then, what are you observations (your experience outside of benchmark JPEG's)? What model sizes now beat Llama2-70B in: