$28
Q1 (25 points): Let X and Y be two random variables. The values for P(X,Y) are shown in Table 1, H(X) is the entropy of X, and MI(X,Y) is the mutual information of X and Y. Please write down the formulas and the results for the following. (a) 1 pt: P(X) (b) 1 pt: P(Y ) (c) 2 pt: P(X | Y ) (d) 2 pt: P(Y | X) (e) 2 pts: Are X and Y independent? Why or why not? (f) 2 pts: H(X) (g) 2 pts: H(Y ) (h) 2 pts: H(X, Y ) (i) 2 pts: H(X | Y ) (j) 2 pts: H(Y | X) (k) 2 pts: MI(X, Y ) (l) 5 pts: The value for Q(X, Y ) are shown in Table 2. What is the value for KL(P(X, Y ) || Q(X, Y ))? What is the value for KL(Q(X, Y ) || P(X, Y ))? Are they the same? Table 1: The joint probability P(X,Y) X=1 X=2 X=3 Y=a 0.10 0.20 0.30 Y=b 0.05 0.15 0.20 Table 2: The joint probability Q(X,Y) X=1 X=2 X=3 Y=a 0.10 0.20 0.40 Y=b 0.01 0.09 0.20 Q2 (10 points): Let X be a random variable for the result of tossing a coin. P(X = h) = p; that is, p is the possibility of getting a head, and 1 − p is the possibility of getting a tail. (a) 1 pt: H(X) is the entropy of X. Write down the formula for H(X). 1 (b) 2 pts: Let p ∗ = arg maxp H(X); that is, p ∗ is the p that results in the maximal value of H(X). What is p ∗ ? (c) 7 pts: Prove that the answer you give in (b) is correct. Hint: recall how you calculate the optimal solution for a function f(x) in your calculus class. In this case, H(X) is a function of p. Q3 (25 points): Permutations and combinations: (a) 6 pts: The class has n students, and n is an even number. The students are forming teams to work on their homework. Each team has exactly 2 students and each student has to appear in exactly one team. How many distinct ways are there to form the teams for the class? Write down the formula. Hint: when n=4, there are 3 ways. For instance, if student #1 and #2 are in the same team, students #3 and #4 would have to be in the same team too. (b) 5 pts: There are 10 balls: 5 are red, 3 are blue, and 2 are white. Suppose you put the balls in a line, how many different color sequences are there? (c) 14 pts: Suppose you want to create a document of length N by using only the words in a vocabulary Σ = {w1, w2, ..., wn}. Let [t1, t2, ..., tn] be a list of non-negative integers such that P i ti = N. (c1) 7 points: How many different documents are there which satisfy the condition that, for each wi in the vocabulary Σ, the occurrence of the word wi in the document is exactly ti? That is, how many different word sequences are there which contain exactly ti wi ’s for each wi in Σ? Hint: The answer to (c1) is very similar to the answer to (b). (c2) 7 pts: Let P(X) be a unigram model on the vocabulary Σ; that is, P(X = wi) is the probability of a word wi , and P wi∈Σ P(X = wi) = 1. Suppose a document of length N is created with the following procedure: for each position in a document, you pick a word from the vocabulary according to P(X); that is, the probability of picking wi is P(X = wi). What is the probability that you will end up with a document where the occurrence of the word wi (for each wi ∈ Σ) in the document is exactly ti? Hint: As (c1) shows, there will be many documents that contain exactly ti wi ’s. The answer to (c2) should be the sum of the probabilities of all these documents. Q4 (10 points): Suppose you want to build a trigram POS tagger. Let T be the size of the tagset and V be the size of the vocabulary. (a) 2 pts: Write down the formula for calculating P(w1, ..., wn, t1, ..., tn), where wi is the i-th word in a sentence, and ti is the POS tag for wi . (b) 8 pts: Suppose you will use an HMM package to implement a trigram POS tagger. • What does each state in HMM correspond to? How many states are there? 2 • What probabilities in the formula for (a) do transition probability aij and emission probability bjk correspond to? ai,j is the transition probability from state si to sj , and bjk is the probability that State sj emits symbol ok. Q5 (10 points): In a POS tagging task, let V be the size of the vocabulary (i.e., the number of words), and T be the size of the tagset. Suppose we want to build a classifier that predicts the tag of the current word by using the following features: 1. Previous word w−1 2. Current word w0 3. Next word w+1 4. Surrounding words w−1 w+1 5. Previous tag t−1 6. Previous two tags t−2 t−1 (a) 3 pts: How many unique features are there in total? You just need to give the answer in the Big-O notation (e.g., O(V 3 )). (b) 2 pts: A classifier predicts class label y given the input x. In this task, what is x? what is y? (c) 5 pts: For the sentence Mike/NN likes/VBP cats/NNS, write down the feature vector for each word in the sentence. The feature vector has the format “InstanceName classLabel featName1 val1 featName2 val2 ....”. For the instanceName, just use the current word. Q6 (10 points): Suppose you want to build a language identifier (LangID) that determines the language code of a given document. The training data is a set of documents with the language code for each document specified. The test data is a set of documents, and your LangID needs to determine the language code of each document. (a) 7 pts: How do you plan to build the LangID system? For instance, if you want to treat this as a classification problem, what would x (the input) be? what would y (the output) be? What would be good features? Name at least five types of features (e.g., one feature type is the word unigrams in the document). (b) 3 pts: What factors (e.g., the amount of training data) could affect the system performance? Name at least three factors, excluding the amount of training data. Q7 (10 “free” points): If you are not familiar with Mallet, please go over the Mallet slides at the course website1 Set up the package in your patas environment, run some experiments. We will use Mallet in later assignments. If you do not have a patas account, you should contact me right away. 1 .The third link at https://canvas.uw.edu/courses/1257221/pages/slides-from-prior-ling570 3 Submission: In your submission, include the following: • readme.(txt|pdf) that includes your answers to Q1-Q6. No need to submit anything for Q7. • Since this assignment does not require programming, there is no need to submit hw.tar.gz, and no need to run check hwX.sh script. 4