Model suggestions for image to prompt

Posted by diesel_heart@reddit | LocalLLaMA | View on Reddit | 14 comments

I don't have much knowledge about this stuff. Which is the best model to generate absolutely detailed prompts from both SFW and NSFW images? What prompt should I use with the image to generate the detailed prompt?

[-]

Iory1998@reddit

Use Qwen-3.6-35BA3B or Qwen-3.5-27B based uncensored models. The Heretic, Disrestricted, or the Haushaus versions are all good. The best models at captioning I ran so far. Nothing comes close.

[-]

diesel_heart@reddit (OP)

Have you tried any uncensored qwen-3.5-9b model? Can’t run 35/27b because of hardware limitations.

[-]

Iory1998@reddit

I am not sure. I don't caption uncensored images, but I know these models are very good at that. Why don't you download the 9B and try it for your use cases.

[-]

Candid-Patience-8581@reddit

Use something like BLIP-2 or LLaVA for image-to-text, then pass it through a prompt enhancer like GPT-4V-style setups or even Stable Diffusion interrogators, and honestly Zoice works fine too for quick clean prompts, just tell it “describe this image in extreme detail for AI generation including lighting, textures, camera, and style” and it’ll do the heavy lifting while you pretend you knew what you were doing all along.

[-]

diesel_heart@reddit (OP)

Thanks a lot

[-]

verdooft@reddit

I use text-generating LLMs to create detailed systemprompts for specific tasks "create a detailed systemprompt for vision models, to regenerate given pictures.". My English is not good, i'm sure, you can optimize this.

You could try this:

https://huggingface.co/lolzinventor/Qwen3.5-4B-Base-ZitGen-V1

"The dataset (images + prompts) was generated entirely by LLMs tasked with regenerating a target image"

There are V2 models too, at the moment without descriptions and no quants.

Vision model for SFW and NSFW:

https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive

Good luck.

[-]

tophlove31415@reddit

I'll second this and point out there is now a qeen3.6-35B that is also very good. I use it to have my agent iterate on 3d print designs so it can get visual feedback for the next iterations. It's really impressive how much spacial understanding it has.

[-]

andrewh2000@reddit

Do you mean you get the agent to look at photographs of a 3d printed object? Or it looks at a rendering before printing?

[-]

tophlove31415@reddit

It builds the objects in openscad usually, then renders a preview image, looks at it and tries to decide what changes are needed to actualize the original objective, then it makes some tweaks to its code, and repeats a handful of times.

It's definitely in its early stages. It reminds me of the time right around when the started building multimodal sota models and I was blown away that they could make simple svg art. This is like that except for 3d printing. I consider it firmly in the creative/artistic stages, and possibly some simple functional pieces. You could continue to iterate and probably come up with some good stuff, though at that point just learning a bit about the design software available would probably be faster.

I'm running qwen 3.6-35b-a3b with the harness something I've built in Python. Any of the sota models could take my message here and help you build something similar I'm sure, if you are interested.

[-]

andrewh2000@reddit

That's very interesting. I've done a couple of one off experiments trying to get an LLM to generate an openscad model and it wasn't very successful. I also got one to generate a script that could be loaded into Fusion 360 and again it didn't work. Sounds like you're having better luck.

[-]

verdooft@reddit

Yes, i use this too, but have not tested it with NSFW.

[-]

diesel_heart@reddit (OP)

Thanks a lot

[-]

NumerousBranch1878@reddit

yeah getting detailed prompts without needing a complicated setup can be tricky. i ran into the same issue with tools needing a lot of tweaking. been using Modelsify for some stuff and it’s been easier to work with so far, handles image to prompt a little more smoothly in my experience

[-]

MaxKruse96@reddit

https://huggingface.co/fancyfeast/llama-joycaption-beta-one-hf-llava