OK, so the first thing we can do with our image is work out where there are sharp intensity changes. These may well correspond to edges of objects in the scene. So, suppose we have the following image (fig 7.1):

The grid on the left indicates the image intensity at different points in the image. (It will be the average intensity over the grid point in question.) The figure on the right is meant to be the original image.

So, our goal is to find the edges of the (triangular) object in the scene. Once we've worked out where the edges are then we will be in a better position to work out what it is.

A first attempt at an edge detection algorithm might be the following. We look at all adjacent pairs of array elements, and check whether the difference between their values exceeds some threshold. If it does, there may well be an edge, as there is a sudden change in image intensity.

Of course, if we use a threshold, then the threshold should really
depend on the size of the array elements. We are interested in how
quickly the intensity changes across a small region, so have to
take into account the size of the region. In fact, what we
really want here is a *derivative* operator ,
where is the distance between the array elements being
considered. We look at the first derivative of intensity wrt. distance,
and where it exceeds some threshold we note a possible edge point.

[I'll keep on talking about *operators*. These will be
functions that will be applied to every point in the image array to
yield a new array of values, revealing edges etc.]

Anyway, there are problems with using a first derivative operator - for example if the intensity gradually gets darker we don't want to note lots and lots of edges. What we REALLY want to find is where the derivative is at a local maximum (it changes less steeply either side). That is, we want to find out where it is most quickly changing from light to dark. Some elementary calculus should tell you that if the first derivative is a local maximum, the second derivative will be zero.

So, in edge detection we are concerned with finding the *zero crossings*
of this second derivative. But, in what direction should we
measure this, given that we have a two dimensional image.
We can use two *directional*
operators, measuring the second derivative in the
x and the y directions (e.g., , measure the
second derivative in the x direction). However, it turns out that
a better approach is to use a *non-directional* operator,
that basically looks at the sum of the second derivatives
in both directions.
A non-directional second derivative operator is the Laplacian operator
. (The normal notation
for this operator is: ). The Laplacian operator is
often used in edge detection - in fact, there is physiological
evidence that people use a similar operator (implemented as
a bunch of retinal cells).

Now, if we just used a zero crossing operator as described above
we would find that there are lots of spurious edges detected,
resulting from noise in the image (and other things).
We generally want to get rid of these, and just
find edges that either correspond to edges of real physical
objects (e.g., the edge of the wooden table) or to distinct surface
markings on the object (e.g., stripes on the object).
There are a number of things which need to be done here. The
main one is to first *smooth* the image before applying our
Laplacian operator (zero crossing detector). We can smooth an
image by, given a particular point in the image, taking a
weighted average of the intensity at this point with the intensities
at near surrounding points. Reasonable weightings are
given if we use an operator based on a Gaussian distribution G = .

So, our basic edge detection operation involves first smoothing with Gaussians and then detecting zero crossing using a Laplacian operator. It turns out (after some maths) that we can combine these into a single operator based on the Laplacian of a Gaussian distribution (). This is often known as a mexican hat operator, as it looks a bit like a mexican hat.

This is all getting pretty mathematical. The main thing is that you need to smooth an image, then find places where the intensity in the smoothed image changes quickly - and in particular where there is a local maximum in the way the intensity changes.

OK, so now we know in theory what sort of operator we should apply
to the image, but how do we do this in practice, given our
discrete, quantised image. The thing to do is to have a
*template* which is a discrete, finite version of the operator.
A 3x3 template which just did the smoothing function might look something
like the following:

Now, suppose we want to apply this operator to our image a couple of pages
back. We can't really apply it to the points at the edge of the image,
so we'll start off considering the image point 2 from top, 2 from
left. We (mentally) put the template centered over this image point,
and for each point in the template we multiply the weight in the template
by the intensity of that image point. The sum gives our new
smoothed value. For the point (2, 2) this will be:

To obtain the smoothed image, the operator is applied in this
manner to each (non border) image point. I'll leave this as
a (tedious) exercise for the reader. Though really it's one of those things you
definitely want to write a program to do, not work out by hand.

Now, we could have a Gaussian operator that used a 5x5 template, or 10x10 one, and smoothed things out over a bigger region. The chosen size of operator depends on the size of the features we want to detect - do we want to detect the stripes on someone's shirt? In fact, it turns out that it is generally a good idea to use smoothing operators of different sizes. Edges detected with all the operators are more likely to be real edges, rather than spurious ones.

The template required for a Laplacian is less obvious, so we'll get there
by stages. For a directional first derivative operator () the
appropriate template is simply:

This really gives us , so the result should be divided by the pixel size . For the second derivative we want to take the difference between two first derivative templates.

(As we are now interested in zero crossings, any bit doesn't matter.)

The two dimensional version (the Laplacian) is a bit more complex. And in fact, what you actually use is the discrete version of the (mexican hat) operator, which you can obtain by first using standard calculus to get the appropriate continuous function, then working out the value of the function at the particular discrete template points.

If we do all this, and apply our mexican hat operator to the image, then we'll get an array with some nagative values, some positive, and some points in between them - these will be our zero crossings, corresponding to edges. If we further put a ``1'' at all the points where we get zero crossings, and a ``0'' otherwise, we get the following sort of image (from the figure above):

This isn't terribly clear due to our small number of image points. However, given a slightly better image we could then set out to work out where the lines (and blobs) were, and obtain a symbolic description on the image (the primal sketch). For this image it might be represented as something like the following:

Edge1: Position: [2, 2] Orientation: 50 Length: 4 Edge2: Position: [4, 2] Orientation: 170 Length 4 Edge3: Position: [3, 4] Orientation: 90 Length: 3

This will be our *primal sketch*.

Fri Aug 19 10:42:17 BST 1994