Llama Image Tagger: A project I made to help me sort thousands of images

Posted by Eisenstein@reddit | LocalLLaMA | View on Reddit | 18 comments

This is a project I have been working on that I thought I would share with the community.

What is it?

It will take folders of images and it will create keywords for them and put them into the image metadata. There is no database (except to keep track of what files were processed) so you can move them wherever you want and the metadata stays. You can use any program or app that can read image metadata to sort them, search them, or categorize them. It can also provide full captions/descriptions.

It does this completely locally. It will download the gguf model weights from hugginface and then run on your machine using a single exec and some scripts. It does not reside in your system or install anything but a few small python libraries for image handling and a small program to write the metadata. Run it and delete it and it is gone.

Why is it?

I do a lot of electronics repair. An important part of any repair is documenting the steps so that when you put the thing back together you can reference them. This means I have several tens of gigabytes of folders full of generically named pictures of random circuit boards. This is combined normal pictures that I care about, as well as hundreds (thousands?) of screenshots and various other crap that I have been dumping into folders for over a decade.

I thought to myself -- this is a solved problem, I am sure!

I looked at all the various professional image software that catalogs. The problem is that I am not a photographer and I don't care about any of the features besides 'sort and label my pictures'.

I looked at the options that non-professional people use to catalog photographs. The problem is that I don't want to put all my stuff in someone else's computer, and narry a non-cloud option can be found (yes, I know about that one everyone loves that has breaking updates every week but that is just a cloud service but on your own network which is not what I want).

I looked at solutions provided by the SD/Image Gen crowd. The problem is that my images are not porn. And here we are.

This project is composed of four components:

  1. KoboldCpp runs the backend: It is one executable, it is updated frequently, the dev actively listens to community feedback, and when you are done using it, your system is the same as before -- no stupid random hidden directories filled with hundreds of gigs of model weights you already had, no docker crap, no python dependency hell, -- it even will download the model weights you need if you specify their location in a config file with the executable

  2. MiniCPM-V 2.6: It is the best vision capable model that can be run as a gguf right now. It is absolutely adequate for these needs

  3. Exiftool: File metatdata is a horrorshow of conflicting standards going back to the early 90s starting with Adobe and turning into a real nightmare in the 2000s when every camera manufacturer and photo software creator decided they would make their own tags and treat everyone else's tags however they felt like. There is a dev who has spent over a decade figuring it all out and as a result we have a tiny but immensely useful program that you can throw metadata at and ignore a lot of the details

  4. The script that does the coordination between these things. This is the only part that I am responsible for

The llmii script does the following:

Example of keyword post-processing

Image01: sedan, roadway, carwash
Image02: car, street, carwash
Image03: car, washing, bigfoot

Expand

Image01: sedan, car, street, roadway, washing, carwash
Image02: sedan, car, street, roadway, washing, carwash
Image03: sedan, car, street, roadway, washing, carwash, bigfoot

DeDupe

Image01: car, roadway, carwash
Image02: car, roadway, carwash
Image03: car, roadway, carwash, bigfoot

In dedupe since street and roadway are tied it would just be the first one that got put on the list.

Note: You do not have to do this! The default is to just leave the keywords alone as they were generated.

Please also note that I am not a great or talented programmer. I am amazed any of this works, so I am happy to take good faith advice and critique and any issues will be looked into, but you should know what you are doing when you run this, but I am sure you are all technically proficient.

If your comment is only going to be 'another project does this' or 'I don't want this' or something else completely useless, write it in the comment box, then close the tab without hitting post. Thanks, but it is rude.