[New Model] - GyroScope: rotates images correctly

Posted by LH-Tech_AI@reddit | LocalLLaMA | View on Reddit | 25 comments

Hey there!
I have made a new model: https://huggingface.co/LH-Tech-AI/GyroScope

So, you just input a image (rotated by 0°, 90°, 180° or 270°) and the model corrects the rotation to make it correct.

Example:

I tested it with lots of photos - and it almost always was correct :D

Final accuracy after 12 epochs of training (\~4h on single T4):

Metric	Value
Overall Val Accuracy	79.81%%
Per-class: 0° (upright)	79.8%
Per-class: 90° CCW	80.1%
Per-class: 180°	79.4%
Per-class: 270° CCW	79.8%
Training Epochs	12
Training Time	\~4h (Kaggle T4 GPU)

Tell me what you think about it :-)

[-]

7734128@reddit

That's really cool and useful. Such a straight forwards idea with easy to create training data.

I suppose you could also do a similar thing for mirrored images, though it would only be able to succeed when there's text or something similar in the image.

[-]

LH-Tech_AI@reddit (OP)

Oh, thank you so much 😊 Yes, mirrored images are a bit harder if there is no text in it. But it seems to be possible.

I'll take a look at it 💪🏻

[-]

7734128@reddit

Just to be clear, I would not have any use for such a thing, so don't create it on my behalf. I was just musing on the possibility.

[-]

LH-Tech_AI@reddit (OP)

Okay, alright 😊
But I'll create it anyways - you brought me the idea, I have time - so let's go 😄
Because I have fun doing it. Will release it on HF: https://huggingface.co/LH-Tech-AI/

[-]

LH-Tech_AI@reddit (OP)

It's running on Kaggle :D

ETA: in \~2 hours

We'll see if it's reliable :D

[-]

kyrylogorbachov@reddit

What is the point if it? What is the use case? Seem to me like a problem that does not need ML to be solved. Am I missing something?

[-]

mikael110@reddit

If you have ever gone though a photo album with hundreds of wrongly rotated images, which I have on many occasions, then I don't think you'd be question the use case for such a model.

Sure you can manually rotate them yourself, but if you can have a literally 11M model that can do it for you why wouldn't you, a model like that can run on practically anything. It could even be integrated into an image viewer as an automatic feature without consuming much resources.

[-]

kyrylogorbachov@reddit

Fair point - I can see this use case. But in that case, why would you use a model for image to image transformation? Why not simply detect the rotation angle you need and apply it algorithmically instead? it just waste of compute.

[-]

mikael110@reddit

The model OP has posted isn't an image to image model, it's just a simple ResNet model that takes the image as input and outputs the required rotation angle as text. The actual rotation is still done using traditional algorithms of course.

As for why you would use an ML model, mostly because detecting the rotiation itself isn't really trivial for arbitrary images. I'm aware there's things like Hough Transform, but that mostly helps on images with well defined lines that you want to straighten out line documents, not random photos.

[-]

LH-Tech_AI@reddit (OP)

Yes, right 👍🏻 That's why I built it. BTW: thank you for your great support. Community here is really great 😃

[-]

the__storm@reddit

It's not an image-to-image model, it's just a classifier (predicts how many 90 degree rotations - 0, 1, 2, or 3 - are needed to make the image ~upright), basically as you describe.

[-]

New_Comfortable7240@reddit

Just in case you can use something like

import cv2
img = cv2.imread('image.jpg')
rotated_img = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)

You add paralleization and should be done efficiently for any number of images

[-]

mikael110@reddit

If all of them had them needed rotation sure, the issue is that the album has a mixture of images some which requires rotation and some which do not. And the images that do need rotation don't all need the same amount.

That's why being able to detect the rotation amount is useful.

[-]

LH-Tech_AI@reddit (OP)

Exactly 💯💪🏻

[-]

LH-Tech_AI@reddit (OP)

Right 😄

[-]

R_Duncan@reddit

It's a 44Mb model, and ML is the only automatic way, I think.

[-]

LH-Tech_AI@reddit (OP)

Yes, I think that too 😊

[-]

koljanos@reddit

Try processing images and rotating them based on the exif data, it’s pretty messed up.

Good job OP!

[-]

LH-Tech_AI@reddit (OP)

Thank you 🤗🤗🤗

[-]

LH-Tech_AI@reddit (OP)

I just test ML models and how well small models can perform on these tasks. I think it's a thing that works well and shows how good my model works. You can use it or not - I don't know, if you'd need it... But it works 😊

[-]

kyrylogorbachov@reddit

For learning purposes, it’s great to experiment with things you’re passionate about. However, it can be a bit misleading for beginners or less experienced users - someone might choose your model for a use case where it’s not actually needed. This is a problem that traditional algorithms have already solved well for a long time, so it would be helpful to clarify that your approach is mainly for experimentation or fun.

[-]

LH-Tech_AI@reddit (OP)

👍🏻

[-]

mikael110@reddit

That's funny I was literally thinking just yesterday about checking if some small VLM like Liquid's recent 350M model would be usable for something like thus, as I was organizing some old albums with lots of bad rotations.

But an 11M model is even better, great work :).

[-]

LH-Tech_AI@reddit (OP)

Thanks so much 😊