What is the best model for generating images?
Posted by rez45gt@reddit | LocalLLaMA | View on Reddit | 20 comments
Hi guys, now with the generation of images using gpt, several ideas came into my head but I wanted to do everything locally, what is the best AI model to generate images locally and what would be the requirements? I've heard about stable diffusion and it's currently the solution that's in my head but I wanted to know if you know of a better one! thanks guys
Healthy-Nebula-3603@reddit
You are serious?
Currently Native gpt-4o
Nextil@reddit
For realism, Flux or Wan2.1. Flux is faster because it's CFG-distilled, and it may have higher fidelity because it's trained on images only. Wan2.1 is more recent. It's a text/image to image/video model. It's not distilled, so negative prompts work well, but that makes it a bit slower. Image fidelity tends to be a bit lower, probably because it's trained on a mix of images and video with the videos being lower resolution and probably lower quality, however its prompt adherence is by far the best out of all the open source models right now.
laurentbourrelly@reddit
I agree that Wan2.1 is truly impressive;
I'm running it on a Mac Studio.
Serprotease@reddit
What are the performance on a Mac Studio? Are you using the full 14b one?
laurentbourrelly@reddit
No
I ordered the new Mac Studio, and maybe 14b will be possible.
Psychological_Cry920@reddit
How you run it on your Mac? A Desktop app or something?
laurentbourrelly@reddit
I use ComfyUI, but had to stay away from 14b (need to wait for my new Mac Studio).
ihaag@reddit
None match gpt4o’s image generation unfortunately Janus pro does get closer than stable diffusion in my opinion
laurentbourrelly@reddit
I'm tired of the hype around GPT 4o.
It's a great Swiss Army Knife, but it's no good for professional use.
Generate 50 images with consistent face, and I'll change my mind.
And wait for the API price. OpenAI makes good drug to create junkies. Then it gets them used to insanely high prices and bet on friction to change habits.
We didn't wait for ChatGPT to suck less at ImageGen to work professionally at generating images.
The only impressive feature is the text-on-image feature (cool binding tech). Everything else can be done with other tools.
I can already pick up on the style of images produced by ChatGPT.
Again, it's a good personal assistant and do it all images, but not a pro specialized GenImage tool.
candreacchio@reddit
I found the consistency with GPT4o to be pretty good. yes there are some details that change, but out of all the image generators, this is providing to be more consistent then most image diffusion models.
laurentbourrelly@reddit
We will agree to disagree on that one.
Generating images with a consistent face is part of my workload.
Amgadoz@reddit
What's a good open model to generate Ghibli still portraits of people?
Serprotease@reddit
Flux, img-to-img with a ghibli Lora.
wonderfulnonsense@reddit
I don't think there are any llm models that generate imsges yet. Facebook had a model (named chameleon iirc) that did this, but they remove image gen capability before releasing it
rookan@reddit
Illustruous
Rich_Artist_8327@reddit
Any image generation models for 7900 xtx?
AgentTin@reddit
What kinds of ideas? Because there are some ideas where the best answer is Flux, but there are other ideas that really benefit from Pony
Mart-McUH@reddit
I suppose it would depend also on purpose. For me the best local is still Flux dev (or its various finetunes). I am not following image generation so closely though.
As for running, while technically FLUX.dev it requires \~36GB VRAM, it can run on less just slower, but not too bad unless your VRAM is really low. There are also FP8 and KV4 (and probably other) quants for less memory footprint at the cost of some quality. Or Schnell variant which can generate in 4-8 steps instead of usual 20-40.
rez45gt@reddit (OP)
I'll take a look, thanks!!
SM8085@reddit
Some projects were going into running SD on small devices, but how much time did you have?
1.x models run fine on my old GPU with 4 GB vRAM.
I was playing with trying to make random Clue and found out gemma3 is relatively decent at StableDiffusion prompting, with some prompting.