A. Given such a corpus, how could you automatically set up a trigram model to address the problem of lexical disambiguation for a standard (untagged) text?

**Answer:**
Each element of the trigram model will be a particular meaning
of a particular word, such as "bat-3". The model will record the probability
that word-meaning w_{3} follows word-meanings
w_{1}, w_{2}. To avoid the sparse data problem, this
is estimated as the sum

P(w_{3} | w_{1},w_{2}) =
L_{1} Freq(w_{3} | w_{1},w_{2}) +
L_{2} Freq(w_{3} | w_{2}) +
L_{3} Freq(w_{3}),

where L_{1}, L_{2}, L_{3} are positive weights
adding to 1.

B. What is the difference between the trigram model in (A) and the disambiguation strategy, "Frequency in context"? Give an example of a sentence that could be disambiguated using the trigram model but not using frequency in context. Give an example of a sentence that could be disambiguated using frequency in context but not using the trigram model.

**Answer:**
The trigram model in (A) picks the meaning of the word by considering the last
two words. The strategy "Frequency in context"
picks the meaning of the word by consider the general context of discussion.
For example, in the sentence "Last night's baseball game was interrupted by
a flock of bats'', the trigram model would correctly interpret ``bats''
to mean ``flying mammals'' as much more likely to follow the words
``flock of''. The ``frequency in context'' heuristic, on the other hand,
would get this wrong: the context would be taken to be "baseball game",
and "bat" would be interpreted as "club". In the sentences,
``In the zoo, there is a bat", or "One thing you need for a baseball
game is a bat", the trigram model provides no guidance,
since the two preceding words "is a" do not affect the likelihood of the
meanings of "bat". However, frequency in context gets these right;
the contexts is "zoo" and "baseball", and the correct meanings are chosen.

**Answer:**
The input to the syntactic parser is a morphological analysis of the
sentence. The output is a syntax tree. The input to the semantic analyzer
is the syntax tree. The output is a representation of the meaning in
some standard representation of knowledge. For example, if the sentence
is "John likes Mary", the syntax tree would be

S ---> NP ---> name ---> John | |-> VP ---> verb ---> likes | |-> NP ---> name ---> Maryand the output of the semantic analysis might be something like "likes(john, mary)".

Consider the following problem: You are given a directed graph G. Every vertex has an associated positive numerical value. The total value of a path through G is the sum of the values of the vertices on the path. If a path goes more than once through a vertex, it picks up points each time it passes through. For instance, in the graph below, the path ACBACD has value 2+1+3+2+1+5 = 14.

The problem is, given such a graph, a starting vertex, and a quota to be attained, find the shortest path whose value is greater than or equal to the quota. For example, in the above graph, if the starting vertex is A and the quota is 12, then path ADEG, with length 4, is the optimal solution.

The problem can be addressed using search in the following state space: (State space search is not actually the best approach to this problem, but never mind that.)

- A state is a path, such as ACBA.
- An operator on state S is to add an arc out of the last vertex in S. For example, there are four possible operators on state "ACBA": (1) Add arc A->C to get ACBAC. (2) Add arc A->D to get ACBAD. (3) Add arc A->F to get ACBAF. (4) Add arc A->H to get ACBAH.
- The starting state is the path containing just the starting vertex.
- A goal state is a path whose value exceeds the quota.

Let V be the number of vertices in G. Let E be the number of edges. Let O be the maximum out-degree of any vertex; that is, O is the maximum number of outarcs from any vertex. Let M be the minimum value of any vertex. For instance in the above graph V=8, E=15, O=4 (vertex A has 4 out-arcs), M=1.

A. Is the state space a tree? Explain your answer.

A
**Answer:** Yes, it is a tree. For any state such as ACBA, there is
only one way to construct it from the starting state A: "Add C",
"Add B", "Add A"

B. For the above example, with starting vertex A, draw the portion of the state space within two steps of the starting state.

**Answer:**

A ---> AC ---> ACD | | | |-> ACB | |-> AD ---> ADE | |-> AF ---> AFE | |-> AH ---> AHA | |-> AHF

C. Give an upper bound on (i) the branching factor; (ii) the depth
of the state space; (iii) the size of the state space. *
Your answer should be a general one, in terms of V, E, I, M, and the quota
Q, not an answer specific to the above graph. *

**Answer:**
(i) The branching factor is at most O. (ii) The depth is at most Q/M.
(iii) The size of the state space is therefore bounded by O^{Q/M}.

D. Suppose that the state space is structured so that the children of a given state are ordered alphabetically. What is the first goal state discovered by a depth-first search? By iterative deepening?

**Answer:**
The first goal discovered by depth-first search will be ACBACBA. The
first goal discovered by iterative deepening will be the shortest solution
ADEG.

**Answer:**
Local minima of the error function over the corpus of labelled examples,
as a function of the weights on the links. The learning algorithm
is in effect doing a hill-climbing search for the value of the weights
that gives the minimal error function. If the hill-climbing search finds
a local mininum of this function, it will get stuck at that suboptimal
state, and will not find the true solution.

**Answer:**
An example E is a false positive of rule R for concept C
if R accepts E but in fact E is not a instance of C.
It is a false negative if
if R rejects E but in fact E is a instance of C.

For example, let C be the category "Makes a good pet", and let R be the conjecture, "X makes a good pet if and only if X is a mammal." A canary would be a false negative. A tiger would be a false positive.

**Answer:**

<[a,b,c,a,d], [c,a,d]> <[a,b,c,a,d], [d]> <[x,y,a,d,f,a,g,h], [a,g,h]> <[x,y,a,d,f,a,g,h], [d,f,a,g,h]> <[x,y,a,d,f,a,g,h], [g,h]> ...These pairs are related by the constraint that the second element in the pair is always a tail of the first element. This relation can be defined as follows in Prolog:

/* k_tail(L,M,K) succeeds if M is the Kth tail of L. For instance, k_tail([a,b,c,a,d],M,1) returns with M bound to [a,b,c,a,d]; k_tail([a,b,c,a,d],M,2) returns with M bound to [b,c,a,d]; etc. */ k_tail(L,L,1). k_tail([_|L],M,K) :- K1 is K-1, k_tail(L,M,K1).The theory of minimum description length (MDL) learning explains in what sense it is appropriate to learn this program from this data. What is this explanation?

**Answer:** If there are many lists in the corpus, or if the lists
are long, then it is possible to encode the data more compactly by

- 1. Giving the program for k-tail.
- 2. Stating the general rule "k-tail(X,Y,K)", where X and Y are the two columns of the data.
- 3. Giving a table with the X and K values rather than the X and Y values. Since the representation of K is considerably more compact than the representation of Y, this will save space.