Natural language systems are developed both to explore general linguistic theories and to produce natural language interfaces or front ends to application systems. In the discussion here we'll generally assume that there is some application system that the user is interacting with, and that it is the job of the understanding system to interpet the user's utterances and ``translate'' them into a suitable form for the application. (For example, we might translate into a database query). We'll also assume for now that the natural language in question is English, though of course in general it might be any other language.
In general the user may communicate with the system by speaking or by typing. Understanding spoken language is much harder than understanding typed language - our input is just the raw speech signals (normally a plot of how much energy is coming in at different frequencies at different points in time). Before we can get to work on what the speech means we must work out from the frequency spectrogram what words are being spoken. This is very difficult to do in general. For a start different speakers all have different voices and accents and individual speakers may articulate differently on different occasion - so there's no simple mapping from speech waveform to word. There may also be background noise, so we have to separate the signals resulting from the wind wistling in the trees from the signals resulting from Fred saying ``Hello''.
Even if the speech is very clear, it may be hard to work out what words are spoken. There may be many different ways of splitting up a sentence into words. In fluent speech there are generally virtually no pauses between words, and the understanding system must guess where word breaks are. As an example of this, consider the sentence ``how to recognise speech''. If spoken quickly this might be misread as saying ``how to wreck a nice beach''. And even if we get the word breaks right we may still not know what words are spoken - some words sound similar (e.g., bear and bare) and it may be impossible to tell which was meant without thinking about the meaning of the sentence.
Because of all these problems a speech understanding system may come up with a number of different alternative sequences of words, perhaps ranked according to their likelihood. Any ambiguities as to what the words in the sentence are (e.g., bear/bare) will be resolved when the system starts trying to work out the meaning of the sentence.
Whether we start with speech signals or typed input, at some stage we'll have a list (or lists) of words and will have to work out what they mean. There are three main stages to this analysis:
(so, if we include speech understanding, we can think of their being four
stages of processing).
We won't be saying much more about speech understanding in this
course. Generally it is feasible if we train a system for
a particular speaker, who speaks clearly in quiet room, using
a limited vocabulary. State of the art systems might allow
us to relax one of these constraints (e.g., single speaker),
but no system has been developed that works effectively
for a wide range of speakers, speaking quickly with a wide vocabulary in
a normal environment. Anyway, the next section will discuss in some
detail the stage of syntactic analysis. The sections after that
will discuss rather more briefly what's involved in semantic and
pragmantic analysis. We will present these as successive stages, where
we first do syntax, then semantics, then pragmatics. However, you
should note that in practical systems it may NOT always be appropriate to
fully separate out the stages, and we may want to interleave, for example,
syntactic and semantic analysis.