OK, so the first thing we can do with our image is work out where there are sharp intensity changes. These may well correspond to edges of objects in the scene. So, suppose we have the following image (fig 7.1):
The grid on the left indicates the image intensity at different points in the image. (It will be the average intensity over the grid point in question.) The figure on the right is meant to be the original image.
So, our goal is to find the edges of the (triangular) object in the scene. Once we've worked out where the edges are then we will be in a better position to work out what it is.
A first attempt at an edge detection algorithm might be the following. We look at all adjacent pairs of array elements, and check whether the difference between their values exceeds some threshold. If it does, there may well be an edge, as there is a sudden change in image intensity.
Of course, if we use a threshold, then the threshold should really depend on the size of the array elements. We are interested in how quickly the intensity changes across a small region, so have to take into account the size of the region. In fact, what we really want here is a derivative operator , where is the distance between the array elements being considered. We look at the first derivative of intensity wrt. distance, and where it exceeds some threshold we note a possible edge point.
[I'll keep on talking about operators. These will be functions that will be applied to every point in the image array to yield a new array of values, revealing edges etc.]
Anyway, there are problems with using a first derivative operator - for example if the intensity gradually gets darker we don't want to note lots and lots of edges. What we REALLY want to find is where the derivative is at a local maximum (it changes less steeply either side). That is, we want to find out where it is most quickly changing from light to dark. Some elementary calculus should tell you that if the first derivative is a local maximum, the second derivative will be zero.
So, in edge detection we are concerned with finding the zero crossings of this second derivative. But, in what direction should we measure this, given that we have a two dimensional image. We can use two directional operators, measuring the second derivative in the x and the y directions (e.g., , measure the second derivative in the x direction). However, it turns out that a better approach is to use a non-directional operator, that basically looks at the sum of the second derivatives in both directions. A non-directional second derivative operator is the Laplacian operator . (The normal notation for this operator is: ). The Laplacian operator is often used in edge detection - in fact, there is physiological evidence that people use a similar operator (implemented as a bunch of retinal cells).
Now, if we just used a zero crossing operator as described above we would find that there are lots of spurious edges detected, resulting from noise in the image (and other things). We generally want to get rid of these, and just find edges that either correspond to edges of real physical objects (e.g., the edge of the wooden table) or to distinct surface markings on the object (e.g., stripes on the object). There are a number of things which need to be done here. The main one is to first smooth the image before applying our Laplacian operator (zero crossing detector). We can smooth an image by, given a particular point in the image, taking a weighted average of the intensity at this point with the intensities at near surrounding points. Reasonable weightings are given if we use an operator based on a Gaussian distribution G = .
So, our basic edge detection operation involves first smoothing with Gaussians and then detecting zero crossing using a Laplacian operator. It turns out (after some maths) that we can combine these into a single operator based on the Laplacian of a Gaussian distribution (). This is often known as a mexican hat operator, as it looks a bit like a mexican hat.
This is all getting pretty mathematical. The main thing is that you need to smooth an image, then find places where the intensity in the smoothed image changes quickly - and in particular where there is a local maximum in the way the intensity changes.
OK, so now we know in theory what sort of operator we should apply
to the image, but how do we do this in practice, given our
discrete, quantised image. The thing to do is to have a
template which is a discrete, finite version of the operator.
A 3x3 template which just did the smoothing function might look something
like the following:
Now, suppose we want to apply this operator to our image a couple of pages
back. We can't really apply it to the points at the edge of the image,
so we'll start off considering the image point 2 from top, 2 from
left. We (mentally) put the template centered over this image point,
and for each point in the template we multiply the weight in the template
by the intensity of that image point. The sum gives our new
smoothed value. For the point (2, 2) this will be:
To obtain the smoothed image, the operator is applied in this manner to each (non border) image point. I'll leave this as a (tedious) exercise for the reader. Though really it's one of those things you definitely want to write a program to do, not work out by hand.
Now, we could have a Gaussian operator that used a 5x5 template, or 10x10 one, and smoothed things out over a bigger region. The chosen size of operator depends on the size of the features we want to detect - do we want to detect the stripes on someone's shirt? In fact, it turns out that it is generally a good idea to use smoothing operators of different sizes. Edges detected with all the operators are more likely to be real edges, rather than spurious ones.
The template required for a Laplacian is less obvious, so we'll get there
by stages. For a directional first derivative operator () the
appropriate template is simply:
This really gives us , so the result should be divided by the pixel size . For the second derivative we want to take the difference between two first derivative templates.
(As we are now interested in zero crossings, any bit doesn't matter.)
The two dimensional version (the Laplacian) is a bit more complex. And in fact, what you actually use is the discrete version of the (mexican hat) operator, which you can obtain by first using standard calculus to get the appropriate continuous function, then working out the value of the function at the particular discrete template points.
If we do all this, and apply our mexican hat operator to the image, then we'll get an array with some nagative values, some positive, and some points in between them - these will be our zero crossings, corresponding to edges. If we further put a ``1'' at all the points where we get zero crossings, and a ``0'' otherwise, we get the following sort of image (from the figure above):
This isn't terribly clear due to our small number of image points. However, given a slightly better image we could then set out to work out where the lines (and blobs) were, and obtain a symbolic description on the image (the primal sketch). For this image it might be represented as something like the following:
Edge1: Position: [2, 2] Orientation: 50 Length: 4 Edge2: Position: [4, 2] Orientation: 170 Length 4 Edge3: Position: [3, 4] Orientation: 90 Length: 3
This will be our primal sketch.