By the power of grayscale!

Posted by lelanthran@reddit | programming | View on Reddit | 13 comments

[-]

makapuf@reddit

Why are all of these algorithms working on grayscale ? Dont you lose useful information by removing color? Or is it simpler to explain ? Cheaper sensors ?

[-]

syklemil@reddit

Another point here is that computer vision isn't just limited to stuff in the human vision range.

E.g. it's pretty common to have cameras that capture infrared data, either to make an alarm goes off if the cat jumps on the table during the night, or or to detect people waiting for a light at an intersection.

I'd expect there's various usecases for treating the channels separately in both accessibility work and art as well, at which point point they can be treated with greyscale algorithms.

[-]

New-Anybody-6206@reddit

Any potential benefit from looking at color will never be worth doing three times the work per pixel.

[-]

yes_u_suckk@reddit

It depends on what you're trying to do. For things like image similarity it has low importance.

When you decompose the image in YUV, the Y (luma) is what gives most information about the shape of the objects in an image. You can look at the picture of the barn in this link: https://en.wikipedia.org/wiki/Y%E2%80%B2UV

So when comparing two images a lot of algorithms give much more weight to the luma numbers than the chroma numbers (U and V).

[-]

ShinyHappyREM@reddit

another example

[-]

R_Sholes@reddit

It's certainly useful information, but not as much as you'd think. You can tell what's on those black and white photos just fine after all, can't you?

It's simpler and faster to work on just intensity values, and color can actually distract from features - e.g. consider something lit with multiple lights, even if the intensity is roughly uniform, the color might vary quite a bit.

So even if you start with a color image, it's easier to transform to single intensity channel for feature detection, even if after that you might use the colored source for that extra information.

[-]

Magneon@reddit

It's worth noting that while the above implementation is /simpler/, opencv will be remarkably faster, at least on any x86 or arm systems that have AVX or equivalent SIMD instructions. That's all handled under the hood without fanfare, but try running a simple convolution kernel in greyskull and then opencv. On x86_64, opencv should be around 32x faster (since it'll be doing a similar loop, but operating on 256 bytes per iteration rather than 1).

This could also be true on embedded systems if they support 32 bit SIMD if the library implemented those. (4x faster pixel operations than 8 bit/loop).

[-]

carrottread@reddit

That's all handled under the hood without fanfare

And sometimes with fanfare too. I've stumbled into some scenarios where OpenCV throws some exceptions deep inside itself and then catches them. It produced correct result, but all this throwing and catching degraded processing performance enough to become slower than manual loops processing single pixel per iteration.

[-]

Magneon@reddit

Yeah. I spent some time profiling cv based image processing and was surprised to find cases where 90% workload pixel reduction made things slower because CV is so well optimized at handling large loops, and my more targeted approach was breaking that.

It's actually remarkably hard to force OpenCV to use or not use specific optimization backends, and this is frustrating since sometimes you have issues like you ran into.

[-]

ShinyHappyREM@reddit

A grayscale image is essentially a 2D array of these pixels, defined by its width and height, but for a simpler memory layout languages such as C often represent it as a 1D array of size width * height

For large images, a more cache-friendly approach is tiling and swizzling.

[-]

CobaltBlue@reddit

That felt like a weird line since memory is all just 1D contiguous arrays; multi-dimensional arrays are always just the compiler doing the index calculation for you.

Cache-friendly data layouts are dope tho!

[-]

neutronium@reddit

If your image is very large, then a neighboring pixel on the next line will be thousands of bytes away in address space. Jumping around in address space like that isn't very cache friendly. OTOH if you divide your image into for instance 8x8 blocks the whole block can fit into one cache line. Of course it now takes a little bit of arithmatic to figure out the address of a particular pixel but processors are much faster at arithmatic than they are at memory access.

[-]

Zomgnerfenigma@reddit

Wouldn't it affect optimizations?