Intel's OpenVINO 2025.2 Brings Support For New Models, GenAI Improvements

Posted by FastDecode1@reddit | LocalLLaMA | View on Reddit | 1 comments

[-]

Calcidiol@reddit

I'm interested to see what anyone might comment about the functionality & performance of this and the other ways to inference small (4B-32B or whatever) dense / MoE LLMs with ARC 7/BM these days.

I wasn't getting very good results lately (before this OV release) with llama.cpp / sycl / vulkan on ARC7; ipex-llm was significantly better than sycl-llama.cpp in some cases but still not glorious performance vs. what I'd think possible.

I haven't tried OV, HF transformers, ONNX lately on ARC7 so I'm wondering where they all stand vs. each other for similar / same model & quantization vs. performance and if there are particular tuning / optimization choices that significantly help in contemporary times for ARC LLM inference.