Recommendations for summarization and structured data extraction

Posted by cachophonic@reddit | LocalLLaMA | View on Reddit | 10 comments

Hi all, I’m looking for people’s current favourites/recommendations for models that are great at following instructions for text summarization and structured data extraction.

For a bit of context the model needs to be able to fit within 48gb of VRAM and the use case is largely extracting specific information (eg question and answer pairs, specific assessment info) and structured JSON data from appointment transcripts. Usually around 30k tokens including prompts per generation.

Our current go to is still Mistral 24b Instruct at fp8 running in VLLM.

This a production project so priority is accuracy, ability to follow instructions and avoid confabulation over raw t/s.

We tried several other models like gpt oss 20b, Qwen3-30B-A3B and several other smaller Qwen models when we initially got started but it's hard to keep up with all the changes so thought I'd see if people have particular go-tos so we can reduce the short list of models to experiment with. Thanks!