AI File Organizer Update: Now with Dry Run Mode and Llama 3.2 as Default Model

Posted by unseenmarscai@reddit | LocalLLaMA | View on Reddit | 50 comments

I previously shared my AI file organizer project that reads and sorts files, and it runs 100% on-device: (https://www.reddit.com/r/LocalLLaMA/comments/1fn3aee/i_built_an_ai_file_organizer_that_reads_and_sorts/) and got tremendous support from the community! Thank you!!!

Here's how it works:

Before:
/home/user/messy_documents/
├── IMG_20230515_140322.jpg
├── IMG_20230516_083045.jpg
├── IMG_20230517_192130.jpg
├── budget_2023.xlsx
├── meeting_notes_05152023.txt
├── project_proposal_draft.docx
├── random_thoughts.txt
├── recipe_chocolate_cake.pdf
├── scan0001.pdf
├── vacation_itinerary.docx
└── work_presentation.pptx

0 directories, 11 files

After:
/home/user/organized_documents/
├── Financial
│   └── 2023_Budget_Spreadsheet.xlsx
├── Food_and_Recipes
│   └── Chocolate_Cake_Recipe.pdf
├── Meetings_and_Notes
│   └── Team_Meeting_Notes_May_15_2023.txt
├── Personal
│   └── Random_Thoughts_and_Ideas.txt
├── Photos
│   ├── Cityscape_Sunset_May_17_2023.jpg
│   ├── Morning_Coffee_Shop_May_16_2023.jpg
│   └── Office_Team_Lunch_May_15_2023.jpg
├── Travel
│   └── Summer_Vacation_Itinerary_2023.doc
└── Work
    ├── Project_X_Proposal_Draft.docx
    ├── Quarterly_Sales_Report.pdf
    └── Marketing_Strategy_Presentation.pptx

7 directories, 11 files

I read through all the comments and worked on implementing changes over the past week. Here are the new features in this release:

v0.0.2 New Features:

Dry Run Mode: Preview sorting results before committing changes
Silent Mode: Save logs to a text file for quieter operation
Expanded file support: .md, .xlsx, .pptx, and .csv
Three sorting options: by content, date, or file type
Default text model updated to Llama 3.2 3B
Enhanced CLI interaction experience
Real-time progress bar for file analysis

For the roadmap and download instructions, check the stable v0.0.2: https://github.com/NexaAI/nexa-sdk/tree/main/examples/local_file_organization

For incremental updates with experimental features, check my personal repo: https://github.com/QiuYannnn/Local-File-Organizer

Credit to the Nexa team for featuring me on their official cookbook and offering tremendous support on this new version. Executables for the whole project are on the way.

What are your thoughts on this update? Is there anything I should prioritize for the next version?

Thank you!!

[-]

mrskeptical00@reddit

Are there any benefits to building this with Nexa vs using an OpenAI compatible API that many people are already running?

[-]

mrskeptical00@reddit

I didn’t mean use Open AI, I meant Open AI compatible APIs like Ollama, LM Studio, llama.cpp, vllm, etc.

I might be out of the loop a bit, but I’ve never heard of Nexa and as cool as this project seems I don’t have any desire to download yet another LLM platform when I’m happy with the my current solution.

[-]

ab2377@reddit

I just read a little about nexai and since they focus on on-device functionality they are supposed to run with whatever is hosting the model on-device, which you won't require the user to first configure and host a model (on ollama/lmstudio) and call that through apis, that's kind of how I understood. but go through their sdk they do have a server with open-ai compatible apis https://docs.nexaai.com/sdk/local-server, i don't know what they are using for inference but they support gguf format so maybe some llama.cpp is in there somewhere. should be reading more.

[-]

mrskeptical00@reddit

If I understand correctly, it saves the step of attaching it to the LLM endpoint - which is the step we’d have to do if we were to attach it to an existing endpoint.

If releasing a retail product, I can see the appeal of using Nexa. On the other hand, releasing it to LocalLlama specifically where most people are running their own endpoints, might make sense to save the Nexa bit and just release prepare the Python code so we can attach it to our existing setups and maybe test with other LLMs.

If I have time I might run it through my LLM and see if it can rewrite it for me 😂

[-]

TeslaCoilzz@reddit

Good point, pardon mate.

NeonHD@reddit

Holy heck this tool could do wonders in organizing my huge porn folder

LazyOnPromethazin@reddit

Finally got it installed. To bad it only works in english. It recognizes my German documents but then translates them.

Wrong_Koala_8557@reddit

hey, when i installed this script and tried to run it, its downloading some 3.5 gb file named model-q4_0.gguf. Need some help plsss :>

No-Bathroom5029@reddit

Can you please make a version that sorts folders? I have a massive collection of PLR products that are not categorized. I'd love for it to be able to sort the folders (with their content), into searchable directories. ie. mindset, relationships, family, pets etc...

Bravecom@reddit

its so slow

Iory1998@reddit

A while ago, a guy created an image-search engine like Everything that can find images based on descriptions. Why don't you add this feature to your project or merge it with that project? It would be highly interesting to sort files based on similarities or content too!

https://github.com/0ssamaak0/CLIPPyX

dasnihil@reddit

make one that does image classification & adding meta tags like "food, travel, beach, sky" to your images so searches can be smarter. we don't need google photos for this anymore, everything local, power to people.

The_frozen_one@reddit

There's actually a pretty cool open-source project called immich (https://immich.app/) that is basically self-hosted Google Photos. It has automatic classification like you're talking about, plus all the other goodies that you would expect from a photo library (facial recognition).

They use CLIP for image classification, which should work well for the kinds of searches you were asking about (and probably a lot faster than using an 11B vision model).

unseenmarscai@reddit (OP)

This is definitely something I can do. Put that on my note!

hey great work! which model do you plan to use to do image classification?

llama 3.2 i believe has vision capabilities now. i've yet to get it on my local, currently lost in the world of flux/comfyui. i remember my excitement when dalle was announced, and that was 100x worse than what i get with flux. keep accelerating bros.

NiceAttorney@reddit

Awesome! I was waiting for office support before I started playing around with this - this hits nearly all of the use cases. Maybe - a year down the line - we could have whisper create a transcription for video files and sort those too.

sibutum@reddit

Now I need this for mails / outlook, local and oss

Mail will be my next project. Do you want it to be a browser extension or something on terminal that can call Gmail or Outlook APIs?

I think browser extension would be easier

gravenbirdman@reddit

Nice! I was literally about to write one of my own. Glad I searched first.

mintybadgerme@reddit

Executables for the whole project are on the way.

Any eta on this?

Awesome! I’ve added batching the whole process by implementing cache, working on gui currently. I’ve send you also main.py with Pathcompleter implemented

crpto42069@reddit

hi can give it list destinateoon directors?

i put my music here "photos" there etc

non higherarchical

I see what you mean. Will implement this for the next version.

thank ser

u doin beatuful work

BlockDigest@reddit

Would be really cool if this could be used alongside Paperless-ngx to add tags and organise documents.

Will look into this!

Not_your_guy_buddy42@reddit

Impressed how you managed to integrate the response from the community to your first post!

Thank you! Will continue building with the community!

gaztrab@reddit

Thabk you for your contribution!

Thank you for checking out the project!

Competitive_Ad_5515@reddit

!remindme 2 days

RemindMeBot@reddit

I will be messaging you in 2 days on 2024-10-03 13:02:41 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

albert_cancook@reddit

Alienanthony@reddit

I see your use of 3.2 is there plans to use the imaging? I have a serious unlabeled collection of memes it would be cool if I could have it look at them and determine a name for them all since they usually end up as download 1 2 3 etc.

I started doing it myself as exetential, educational, blah blah blah. But this would be amazing.

Will look into a better vision model for the next version. Did you try part of your collection with the current Llava 1.6? It is pretty good for my testing.

Lissanro@reddit

I think one of the best models currently is Molmo 72B (it has 4-bit quant https://huggingface.co/SeanScripts/Molmo-72B-0924-nf4 ).

There is also Molmo 7B version, its 4-bit quant ( https://huggingface.co/cyan2k/molmo-7B-D-bnb-4bit ) is said to fit even on 12GB card. I did not test the small version though, but I think it supposed to be on par or better than Qwen2-VL of corresponding size, depending on the task.

Llama 3.2 90B on the hand was quite disappointing, not only it way too censored (to the point of refusing identifying a well known person without jailbreaking technics), but where it worked, it was mostly either on behind or par with Molmo, only rarely succeeding where Molmo failed, mostly in tasks that relied more text generation capabilities rather than vision. Llama 90B may have a good potential though if fine-tuned.

I did not try Llava 1.6, so I cannot tell how it compares to Molmo, Qwen2-VL or Llama 3.2.

Thank you! There are many things on the roadmap!

InterstellarReddit@reddit

Good work making run in device!

onicarps@reddit

Thank you for this! Will try out soon