top of page
Image Segmentation Simplified: Tresholding to Start

Project no.2

DAILY LIVES

Computer vision technology has increased in complexity and usability dramatically in the past decade, making strides in various areas of our lives. Some of the more common use cases we see in our world today include:

  • Sensors and cameras in autonomous vehicles

  • Face recognition for security or social media

  • Medical imaging in the healthcare industry

  • Product information based on visual search

Forward momentum in all these areas are still necessary, but the opportunities are endless. What may seem as an obvious extension of our world, the quicker it gets integrated, however is currently a huge curioustity of mine - how exactly can a computer see?

HOW COMPUTERS SEE

 

One of the building blocks of computer vision is image processing and an important step in this is segmentation. This allows the computer to give meaning to objects that are technically just a combination of pixels. It does this by partitioning a digital image into smaller segments to simplify the image into something more meaningful. There are many techniques for this but one of the simplest methods is through thresholding.

 

THRESHOLDING

Before diving into how thresholding works, it’s important to understand how computers interpret images. While we view images as meaningful representations of shapes and colors, computers view it as nothing more than an array of numbers. Let’s take a few examples from my recent trip to Jordan.

This is how you and I see it:

 

imagesegment_edited.jpg
blue-lizard-color.png

But this is how the computer sees it:

liz-array.png

Here we are using the scikit image library to load and read the images. They’re all an array shaped 1037 x 1555 x 3, which represents the height, width and depth respectively of the image. The depth in this case indicates color (RGB).

 

To demonstrate thresholding at its simplest, let’s switch them to a single channel, grayscale.

blue-lizard-gray.png

So how exactly does thresholding work?

In the simplest form, the idea is to assign black for all pixels below a given constant and white for anything above it. Since an image is just an array of numbers, it essentially chooses pixels based on a given value and segments based on whether it’s above or below it to distinguish between the foreground and background.

 

We can do this in both a supervised as well as unsupervised method. Supervised methods require our input, while unsupervised techniques are conducted automatically. Let’s dig into both.

 

SUPERVISED

In order to do this manually, we need to first represent the image array as a histogram with the following variables:

  • x = intensity of pixels

  • y = number of pixels

Ideally your output is bimodal to be able to select a clear threshold in the middle. Below is the histogram for our lizard image.

blue-lizard-hist.png

Based on this, we can choose a number in the middle of the two peaks as our threshold and recreate our image as black if under and white if over that value. This new image is created based on a True or False logic - turn pixel on if True, off if False. For this image below I've selected 0.45.

blue-lizard-segment.png

With these examples we can see that it is able to pick up the general outline of the objects, but some of the other edges are either captured or missed. Lighting is a big influencer for this method, so this may not be feasible in low light or if the light hits the angles differently.

In the first example, we conducted supervised learning as we manually selected a threshold. This is the simplest way of accomplishing our task, however if the histogram isn’t bimodal or if we have a large number of images that need to be processed, this could become difficult. However, this can be done in an unsupervised method as well, using algorithms built into the skimage library.

UNSUPERVISED

One common unsupervised thresholding technique is Otsu’s method, which first iterates through all possible thresholds and secondarily measures the variance within the two groups, above and below the threshold. The optimal threshold is where the sum of the two variances are at a minimum. The idea is similar to what we did with the histogram, but is able to produce a more accurate threshold than our eye as it goes through every possibility. The downside is that it takes time and may have weaker results in images with uneven lighting.

Lighting is key here. As we’ve discussed, these above methods are purely based on pixel values in grayscale and thus are influenced by the light. This is why there are two categories of thresholding:

  • Global - This is what we’ve discussed so far, where a single threshold is defined for the entire image. It’s useful when the image background is uniform and distinctly contrasts with the foreground.

  • Local (adaptive or dynamic) - This is useful when there’s greater variation within the intensity of the pixels for both the background and foreground. In this method, each pixel, or group of pixels referred to as block size, is evaluated separately. These groups or local neighborhoods are statistically examined in terms of its intensity between the white to black spectrum and assigned segments based on those values. Each neighborhood has a threshold. The downside to this is that it is slow.

There are many threshold methodologies within the skimage library, but one easy way to test out multiple at once is to use the ‘try_all_threshold’ method. This tests all of the following methods at once:

  • Li

  • Minimum

  • Triangle

  • Isodata

  • Mean

  • Otsu

  • Yen

It produces an output image for each for easy comparison. This is useful if you’re not familiar with how each method works and let’s you choose the optimal result at a glance.

Here is the output for our image example:

blue-lizard-all.png

Based on this, either the Otsu or Isodata seem to capture the lizard the best. However in both we capture the darker section in the top right as well. This again shows the limitations of thresholding, which is only the simplest method of segmentation.

 

In order to better capture the lizard, some other image segmentation methodologies we could explore include edge detection or clustering. I’ll plan to cover those next time.

bottom of page