Are there multimodal code models?

Posted by hapliniste@reddit | LocalLLaMA | View on Reddit | 10 comments

Hi, I'm searching for a small multimodal LLM specialized in code. I'm not satisfied with big API models on image to code tasks, they clearly don't understand the fine design things and only somewhat understand the structure of the website instead of the colors, shapes and things like that. I'd like to see if I can finetune a small model on image-code pairs and try to create a designer model that can create the CSS needed to replicate a website and things like that, including the colors, margins, radius and more. It feel like no one is using image-code pairs in their finetunes even though the data is readily available. Can someone recommend me a multimodal LLM specialized in code? And maybe direct me to the best finetuning repo you have because I didn't finetune any model so far.