Artificial Intelligence Midterm Exam: Solutions
Problem 1. (35 points)
Suppose that the following facts and rules have been entered into Prolog:
p(a, [a,a,a]).
p(a, [a,b]).
p(b, [b,a]).
p(b, [a,b]).
p(c, [a,c]).
q(X,M) : p(X, [X,YL]), !, p(Y, M).
q(X,M) : p(X, [M,X]).
For each of the following queries, list all the answers that Prolog will
give on successive backtrackings. (For example, for the query
``? p(a,L).'' the answer would be ``First Prolog returns L=[a,a,a]; then
L=[a,b]; then fails.'')
A. ? p(X, [XL]).
B. ? p(X, [X,Y  L]).
C. ? p(X, [a  X]).
D. ? q(c,Z).
E. ? q(S,T).
Answer:
A: Succeeds with X=a, L=[a,a]; then with X=a, L=[b]; then with X=b, L=[a];
then fails.
B. Succeeds with X=a, Y=a, L=[a]; then with X=a, Y=b, L=[]; then with
X=b, Y=a, L=[]; then fails.
C. Fails
D. Succeeds with Z =a (using the second rule for q.; then fails.
E. Succeeds with S=a, T = [a,a,a]; then with S=a, T=[a,b]; then fails.
Problem 2
We wish to write a Prolog predicate ``int(N)'' that returns successive value
of the integers on successive backtrackings. That is, the query
``? int(N)'' first succeeds with N=0; then on backtracking succeeds with
N=1; then with N=2; and so on.
Which of the following is the correct code for int? (Only one is correct.)
i. int(N) : 0.
int(N) : int(M) + 1.
ii. int(0).
int(N) : int(M), N is M+1.
iii. int(0).
int(N) : int(N1).
iv. int(0).
int(N) : int(N), N is N+1.
v. int(0).
int(N) : M is N1, int(M).
Answer: ii.
B. Suppose we have the following Prolog definition:
w(N) : int(I), int(J), N is (10 * J) + I.
What values are returned for the query ``? w(N)'' on successive backtrackings?
Answer: 0, 10, 20, 30, ... Note that, since the call int(J) backtracks
forever, Prolog never backtracks to the call to int(I).
Problem 3 (40 points)
Consider the following simple context free grammar:
S > NP VP
NP >  {} * PP*
VP > {} {NP} {NP} PP*
PP > NP
: cold
: the, a
: can
: fish, river, swim, cold
: for, in
: I
: can, fish, swim, caught
\end{verbatim}
(Note: ``can'' as a verb means, ``Put in a can''; e.g. ``I am busy canning
tomatoes today.'' Swim is a noun, as in "John went for a swim.")
A. Show all the parse trees for ``I can fish''.
B. The natural reading of ``I can fish'' is ``I am able to fish'',
rather than ``I put fish into cans.'' What strategy could a program use
to disambiguate this sentence correctly?
C. Show the two parse trees for ``I caught a cold fish.''
D. What strategy could a program use to choose the correct parse
tree for ``I caught a cold fish''?
E. In ``I caught a fish'', the word ``caught'' means ``captured''.
In ``I caught a cold'', the word ``caught'' means ``became ill with''.
What strategy could be used to find the correct meaning of ``caught''
in each sentence (supposing that these were the only two possibilities)?
Answer:
A. There are two parse trees
S > NP > > I

> VP > > can

> > fish
S > NP > > I

> VP > > can

> NP > > fish
B. Simple lexical frequency is best here: "can" is very much more often
used as the modal meaning "able to" than as the verb meaning "put in cans".
(Incidentally, in spoken English, the two are distinguished. If the
meaning is "I am able to fish", the word "can" would be unstressed and the
vowel would be likely to shift to "uh". If the meaning were "put into cans",
the word would have to get at least equal stress and the vowel enunciated
clearly.)
C.
S > NP > > I

> VP > > caught

> NP > > a

> > cold

> > fish
S > NP > > I

> VP > > caught

> NP > > a
 
 > > cold

> NP > > fish
D.
The English construction with two NP's following the verb is used
in sentences like "John gave Mary a rose", or "John told Mary a story".
Here, the second NP is the object of the verb, and the first NP is the
indirect object, the recipient of the action. (I did mention this in
class, but I should have reminded you of it on the exam. My apologies.)
One can use "catch" in this construction  e.g. "I caught Mary a fish
for breakfast,"  but there is a selectional restriction that the
indirect object has to be the kind of thing that can possess or own the
direct object; either animate or an organization. One cannot catch fish
for a cold, because a cold is not animate and cannot own fish.
This was probably too hard a problem for this setting, so I have been lenient
in accepting reasonable answers.
E. The word "caught" meaning "seized" has a selectional restriction that
it takes a physical object as its object. The word "caught" meaning
"became ill with" has a selectional restriction that it takes a contagious
disease as its object.
\noindent
{\bf Problem 4} (10 points)
\begin{itemize}
\item[A.]
Let w[1], w[2] ... w[k] be a sequence of k words. Which of
the following formulas expresses the trigram assumption? (Only one
is correct.) Explain the formula in words.
i. P(w[I]) = P(w[I]  w[1] w[2] ... w[I])
ii. P(w[I]) = P(w[I]  w[1] w[2] ... w[I1])
iii. P(w[I]) = P(w[I]  w[I2] w[I1])
iv. P(w[I]  w[1] w[2] ... w[I1]) = P(w[I]  w[I1])
v. P(w[I]  w[1] w[2] ... w[I1]) = P(w[I]  w[I2] w[I1])
Answer: v. The probability of finding a specified word w_[I] after a
specified sequence w[1] ... w[I1] depends only on the two most recent
words w[I2] and w[I1].
(Incidentally, the meaning of formula iii would be this: The probability
of finding word w[I] after words w[I2] and w[I1] is the same as the
probability of finding word w[I] given no information about the other
words in the sentence. In other words, this formula would mean that
w[I] is independent of the two previous words in the sentence.)
B.
In reading a new text, it is quite likely that you will encounter a triple
of consecutive words that you have never encountered anywhere else. Why
is this a problem for building a usable trigram model of natural language?
Describe a solution that is commonly used for this problem.
Answer: The problem is that, if you are estimate the probability
P(w[I]  w[I2] w[I1]) as the frequency ratio freq(w[I]  w[I2] w[I1]) =
#(w[I2] w[I1] w[I]) / #(w[I2] w[I1]) and the sequence w[I2] w[I1] w[I]
has never been encountered before, then it will be assigned a probability
of zero. That is, it will be treated as impossible and any other reading,
no matter how implausible, will be preferred.
The solution is to use smoothing, where the probability
P(w[I]  w[I2] w[I1]) is estimated as a weighted sum, combining the
unigram model, the bigram model, and the trigram model
P(w[I]  w[I2] w[I1]) =
L1*freq(w[I]) + L2*freq(w[I]  w[I1]) + L3*freq(w[I]  w[I2] w[I1])
That is, each word w[I] has a certain probability of turning up in an entirely
new context, and each word has a certain probability of following a word
w[I1] that it's been seen with before.