LLaVA-NeXT: Stronger LLMs Supercharge Multimodal Capabilities in the Wild
Posted by isr_431@reddit | LocalLLaMA | View on Reddit | 8 comments
The team behind LLaVA has released a few new multimodal models: [LLaMA3 8B](https://huggingface.co/lmms-lab/llama3-llava-next-8b) and Qwen-1.5 [72B](https://huggingface.co/lmms-lab/llava-next-72b) and [110B](https://huggingface.co/lmms-lab/llava-next-110b). From the [blog post](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/):
>**Today, we expanded the LLaVA-NeXT with recent stronger open LLMs**, reporting our findings on more capable language models:
>**Increasing multimodal capaiblies with stronger & larger language models, up to 3x model size.** This allows LMMs to present better visual world knowledge and logical reasoning inherited from LLM. It supports LLaMA3 (8B) and Qwen-1.5 (72B and 110B).
>**Better visual chat for more real-life scenarios, covering different applications.** To evaluate the improved multimodal capabilities in the wild, we collect and develop new evaluation datasets, [LLaVA-Bench (Wilder)](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/#3-llava-bench-wilder), which inherit the spirit of [LLaVA-Bench (in-the-wild)](https://github.com/haotian-liu/LLaVA/blob/main/docs/LLaVA_Bench.md) to study daily-life visual chat and enlarge the data size for comprehensive evaluation.
8 Comments
chibop1@reddit
Next_Program90@reddit
AmazinglyObliviouse@reddit
pmp22@reddit
RekTek4@reddit
pseudonerv@reddit
LPN64@reddit
pseudonerv@reddit