AI File Organizer Update: Now with Dry Run Mode and Llama 3.2 as Default Model
Posted by unseenmarscai@reddit | LocalLLaMA | View on Reddit | 50 comments
I previously shared my AI file organizer project that reads and sorts files, and it runs 100% on-device: (https://www.reddit.com/r/LocalLLaMA/comments/1fn3aee/i_built_an_ai_file_organizer_that_reads_and_sorts/) and got tremendous support from the community! Thank you!!!
Here's how it works:
Before:
/home/user/messy_documents/
├── IMG_20230515_140322.jpg
├── IMG_20230516_083045.jpg
├── IMG_20230517_192130.jpg
├── budget_2023.xlsx
├── meeting_notes_05152023.txt
├── project_proposal_draft.docx
├── random_thoughts.txt
├── recipe_chocolate_cake.pdf
├── scan0001.pdf
├── vacation_itinerary.docx
└── work_presentation.pptx
0 directories, 11 files
After:
/home/user/organized_documents/
├── Financial
│ └── 2023_Budget_Spreadsheet.xlsx
├── Food_and_Recipes
│ └── Chocolate_Cake_Recipe.pdf
├── Meetings_and_Notes
│ └── Team_Meeting_Notes_May_15_2023.txt
├── Personal
│ └── Random_Thoughts_and_Ideas.txt
├── Photos
│ ├── Cityscape_Sunset_May_17_2023.jpg
│ ├── Morning_Coffee_Shop_May_16_2023.jpg
│ └── Office_Team_Lunch_May_15_2023.jpg
├── Travel
│ └── Summer_Vacation_Itinerary_2023.doc
└── Work
├── Project_X_Proposal_Draft.docx
├── Quarterly_Sales_Report.pdf
└── Marketing_Strategy_Presentation.pptx
7 directories, 11 files
I read through all the comments and worked on implementing changes over the past week. Here are the new features in this release:
v0.0.2 New Features:
- Dry Run Mode: Preview sorting results before committing changes
- Silent Mode: Save logs to a text file for quieter operation
- Expanded file support:
.md
,.xlsx
,.pptx
, and.csv
- Three sorting options: by content, date, or file type
- Default text model updated to Llama 3.2 3B
- Enhanced CLI interaction experience
- Real-time progress bar for file analysis
For the roadmap and download instructions, check the stable v0.0.2: https://github.com/NexaAI/nexa-sdk/tree/main/examples/local_file_organization
For incremental updates with experimental features, check my personal repo: https://github.com/QiuYannnn/Local-File-Organizer
Credit to the Nexa team for featuring me on their official cookbook and offering tremendous support on this new version. Executables for the whole project are on the way.
What are your thoughts on this update? Is there anything I should prioritize for the next version?
Thank you!!
NeonHD@reddit
Holy heck this tool could do wonders in organizing my huge porn folder
LazyOnPromethazin@reddit
Finally got it installed. To bad it only works in english. It recognizes my German documents but then translates them.
Wrong_Koala_8557@reddit
hey, when i installed this script and tried to run it, its downloading some 3.5 gb file named model-q4_0.gguf. Need some help plsss :>
No-Bathroom5029@reddit
Can you please make a version that sorts folders? I have a massive collection of PLR products that are not categorized. I'd love for it to be able to sort the folders (with their content), into searchable directories. ie. mindset, relationships, family, pets etc...
Bravecom@reddit
its so slow
Iory1998@reddit
A while ago, a guy created an image-search engine like Everything that can find images based on descriptions. Why don't you add this feature to your project or merge it with that project? It would be highly interesting to sort files based on similarities or content too!
https://github.com/0ssamaak0/CLIPPyX
dasnihil@reddit
make one that does image classification & adding meta tags like "food, travel, beach, sky" to your images so searches can be smarter. we don't need google photos for this anymore, everything local, power to people.
The_frozen_one@reddit
There's actually a pretty cool open-source project called immich (https://immich.app/) that is basically self-hosted Google Photos. It has automatic classification like you're talking about, plus all the other goodies that you would expect from a photo library (facial recognition).
They use CLIP for image classification, which should work well for the kinds of searches you were asking about (and probably a lot faster than using an 11B vision model).
unseenmarscai@reddit (OP)
This is definitely something I can do. Put that on my note!
ab2377@reddit
hey great work! which model do you plan to use to do image classification?
dasnihil@reddit
llama 3.2 i believe has vision capabilities now. i've yet to get it on my local, currently lost in the world of flux/comfyui. i remember my excitement when dalle was announced, and that was 100x worse than what i get with flux. keep accelerating bros.
NiceAttorney@reddit
Awesome! I was waiting for office support before I started playing around with this - this hits nearly all of the use cases. Maybe - a year down the line - we could have whisper create a transcription for video files and sort those too.
mrskeptical00@reddit
Are there any benefits to building this with Nexa vs using an OpenAI compatible API that many people are already running?
TeslaCoilzz@reddit
Privacy of the data…?
mrskeptical00@reddit
I didn’t mean use Open AI, I meant Open AI compatible APIs like Ollama, LM Studio, llama.cpp, vllm, etc.
I might be out of the loop a bit, but I’ve never heard of Nexa and as cool as this project seems I don’t have any desire to download yet another LLM platform when I’m happy with the my current solution.
ab2377@reddit
I just read a little about nexai and since they focus on on-device functionality they are supposed to run with whatever is hosting the model on-device, which you won't require the user to first configure and host a model (on ollama/lmstudio) and call that through apis, that's kind of how I understood. but go through their sdk they do have a server with open-ai compatible apis https://docs.nexaai.com/sdk/local-server, i don't know what they are using for inference but they support gguf format so maybe some llama.cpp is in there somewhere. should be reading more.
mrskeptical00@reddit
If I understand correctly, it saves the step of attaching it to the LLM endpoint - which is the step we’d have to do if we were to attach it to an existing endpoint.
If releasing a retail product, I can see the appeal of using Nexa. On the other hand, releasing it to LocalLlama specifically where most people are running their own endpoints, might make sense to save the Nexa bit and just release prepare the Python code so we can attach it to our existing setups and maybe test with other LLMs.
If I have time I might run it through my LLM and see if it can rewrite it for me 😂
TeslaCoilzz@reddit
Good point, pardon mate.
sibutum@reddit
Now I need this for mails / outlook, local and oss
unseenmarscai@reddit (OP)
Mail will be my next project. Do you want it to be a browser extension or something on terminal that can call Gmail or Outlook APIs?
sibutum@reddit
I think browser extension would be easier
gravenbirdman@reddit
Nice! I was literally about to write one of my own. Glad I searched first.
mintybadgerme@reddit
Any eta on this?
TeslaCoilzz@reddit
Awesome! I’ve added batching the whole process by implementing cache, working on gui currently. I’ve send you also main.py with Pathcompleter implemented
crpto42069@reddit
hi can give it list destinateoon directors?
i put my music here "photos" there etc
non higherarchical
unseenmarscai@reddit (OP)
I see what you mean. Will implement this for the next version.
crpto42069@reddit
thank ser
u doin beatuful work
BlockDigest@reddit
Would be really cool if this could be used alongside Paperless-ngx to add tags and organise documents.
unseenmarscai@reddit (OP)
Will look into this!
Not_your_guy_buddy42@reddit
Impressed how you managed to integrate the response from the community to your first post!
unseenmarscai@reddit (OP)
Thank you! Will continue building with the community!
gaztrab@reddit
Thabk you for your contribution!
unseenmarscai@reddit (OP)
Thank you for checking out the project!
Competitive_Ad_5515@reddit
!remindme 2 days
RemindMeBot@reddit
I will be messaging you in 2 days on 2024-10-03 13:02:41 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)
albert_cancook@reddit
?
Alienanthony@reddit
I see your use of 3.2 is there plans to use the imaging? I have a serious unlabeled collection of memes it would be cool if I could have it look at them and determine a name for them all since they usually end up as download 1 2 3 etc.
I started doing it myself as exetential, educational, blah blah blah. But this would be amazing.
unseenmarscai@reddit (OP)
Will look into a better vision model for the next version. Did you try part of your collection with the current Llava 1.6? It is pretty good for my testing.
Lissanro@reddit
I think one of the best models currently is Molmo 72B (it has 4-bit quant https://huggingface.co/SeanScripts/Molmo-72B-0924-nf4 ).
There is also Molmo 7B version, its 4-bit quant ( https://huggingface.co/cyan2k/molmo-7B-D-bnb-4bit ) is said to fit even on 12GB card. I did not test the small version though, but I think it supposed to be on par or better than Qwen2-VL of corresponding size, depending on the task.
Llama 3.2 90B on the hand was quite disappointing, not only it way too censored (to the point of refusing identifying a well known person without jailbreaking technics), but where it worked, it was mostly either on behind or par with Molmo, only rarely succeeding where Molmo failed, mostly in tasks that relied more text generation capabilities rather than vision. Llama 90B may have a good potential though if fine-tuned.
I did not try Llava 1.6, so I cannot tell how it compares to Molmo, Qwen2-VL or Llama 3.2.
the_anonymous@reddit
I second llava 1.6. currently developing a 'pokedex' using Llava-mistral 1.6. I'm getting pretty good responses using llama-cpp with grammar to get a structured json.
titaniumred@reddit
So do you need to have a local installation of Llama 3.2 running for this to work?
unseenmarscai@reddit (OP)
Yes. It will pull a quantized version (Q3_K_M) of Llama 3.2 from Nexa SDK when you run the script for the first time.
FreddieM007@reddit
Cool project! Since you are already reading and understanding the content of each file, can you turn it into a search index to enable intelligent, semantic searches?
unseenmarscai@reddit (OP)
Yes, many people have requested local semantic searches. Optimizing the performance and indexing structure will be a separate project. I’ll look into it for the future version.
shepbryan@reddit
Love seeing the dry run implementation. That was a solid suggestion from the community too
unseenmarscai@reddit (OP)
It actually speeds up the entire process quite a bit : )
bwjxjelsbd@reddit
This is such a very good use case for AI.
Thanks for making this and keep on building, sir.
unseenmarscai@reddit (OP)
Thank you! There are many things on the roadmap!
InterstellarReddit@reddit
Good work making run in device!
onicarps@reddit
Thank you for this! Will try out soon