- July 17, 2020

CSCI 567 Summer 2018 Sample Midterm DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO Name ID # Read the following instructions carefully; you will be expected to conform to them during the exam. • Multiple choice questions must be answered by filling in the appropriate segments on the Scantron form provided with a #2 pencil. These questions might not be weighted equally and partial credit may be available on some of them. • Free response questions should be answered concisely. Excessive writings will incur mark reduction. Derivations should be short — no need to prove/derive extensively. No derivations will require more than 10 lines. • Write legibly so the grader can read your answers without misunderstandings. Avoid cursive writings. • You must answer each free response question on the page provided. We will not provide blank new copies of pages. • The duration of the quiz is 90 minutes. Please budget your time on each question accordingly. • You may not leave your seat during the exam for any reason. If you leave your seat, you may be required to submit your exam at that moment. • The quiz has a total of 7 physical pages (including this cover), 19 multiple choice questions and 3 free response questions. Each question may have sub-questions. Once you are permitted to open your exam (and not before), you should count the pages to ensure that you have the right number. • This is a closed-book exam. Consulting classmates, the Internet, or other resources is NOT permitted. You may not use any electronics for any reason or say anything to a classmate for any reason. • There are some spaces in the packet that can be used for scratch work. No additional scratch paper will be provided. • When the proctor tells you the test is over, you must immediately cease writing and close the packet. Continuing to write after the exam is over will be reported for academic discipline, including at mini- mum an F in the class. Question/Section Possible Points Multiple choices 19 1 3 2 3 3 5 This is the inside cover of the exam. You may use this as scratch paper if needed. Your packet must remain attached throughout the exam – you may not remove this page for any reason. 2 Naive Bayes 1. The Super Bowl MVP award has been given out 53 times, although some individuals have won it more than once. We can break down the data for the award by where the winner went to college; suppose we have the following incomplete data for fifteen (15) instances of the award: University Times an alum has won Super Bowl MVP Michigan 5 USC 3 Florida State 2 Stanford 2 Louisville 1 UCLA 1 West Virginia 1 Suppose I want to use this (incomplete) data and a Naive Bayes approach to estimate the probability that someone from USC wins the next Super Bowl MVP award. What will my probability estimate be? (a) 3/15 (b) 15/53 (c) 3/53 (d) 3/7 (e) 1/7 Regularization Which of the following can be used as regularization to control model complexity? For questions 2 to 3, answer yes by filling in (A) for the appropriate Scantron row, or no by filling in (B). 2. R(w) = ‖w‖1 = ∑D d=1 |wd| 3. R(w) = ∑D d=1 w 3 d 3 Concept Learning For this problem, we will use the “sunburned?” domain from June 12’s discussion. Attributes in this problem were: • Hair Color, which could be blonde, brown, or red (3 choices) • Height, which could be short, average, or tall (3 choices) • Weight, which could be light, average, or heavy (3 choices) • Lotion?, indicating if this person applied sunscreen, which could be yes or no (2 choices) 4. One of the following is not in S = 〈blonde, ?, ?, no〉and G = 〈?, ?, ?, ?〉for the Candidate-Elimination algorithm. What is it? (a) 〈blonde, average, light, no 〉 (b) 〈blonde, tall, light, no 〉 (c) 〈blonde, average, heavy, no 〉 (d) 〈red, average, average, no 〉 (e) 〈blonde, average, light, no 〉 5. Given the above attributes, how many syntactically distinct hypotheses are in the hypothesis space? (a) 5 · 5 · 5 · 4 (b) 1 + 4 · 4 · 4 · 3 (c) 3 · 3 · 3 · 2 6. Given the above attributes, how many semantically distinct hypotheses are in the hypothesis space? (a) 5 · 5 · 5 · 4 (b) 1 + 4 · 4 · 4 · 3 (c) 3 · 3 · 3 · 2 4 K Nearest Neighbors classifier Questions 7 through 13 deal with the following data, where squares, triangles, and open circles are three different classes of data in the training set and the diamond (♦) and star (*) are test points. 1 1.5 2 2.5 3 1 1.5 2 2.5 3 7. Suppose I use all the training data as the full validation set (disregarding the two specially-marked test items) to classify the data using a KNN with k=1. How many of the 13 points will be correctly classified? (a) 0 (b) 1 (c) 11 (d) 12 (e) 13 8. For the KNN classifier with k=1, how many training data points will be misclassified with leave-one-out heuristic? (a) 0 (b) 1 (c) 11 (d) 12 (e) 13 9. What is the smallest value of k to always classify the diamond as class triangle? (a) 1 (b) 3 (c) 5 (d) 7 (e) 9 5 The following is the same diagram from the previous page. 1 1.5 2 2.5 3 1 1.5 2 2.5 3 10. What is the smallest value of k to classify the star as open circle? (a) 1 (b) 3 (c) 5 (d) 7 (e) cannot be done 11. What label will we predict for diamond with k = 1? (a) Square (b) Triangle (c) Circle (d) Cannot be determined from data available. 12. What label will we predict for star with k = 3? (a) Square (b) Triangle (c) Circle (d) Cannot be determined from data available. 13. What label will we predict for star with k = 5? (a) Square (b) Triangle (c) Circle (d) Cannot be determined from data available. 6 Decision tree Questions 14 through 19, as well as the two free response questions in this section, deal with the data listed in Table 1, which contains some information of several houses on the market. Table 1: List of houses Houses Lot (sqft) Built year Price ($) A 10454 1970 535k B 6969 1977 375k C 6524 1910 350k D 11019 1975 495k E 10500 1993 520k F 5719 2005 395k G 4356 2000 440k We would like to build a decision tree classifier for the house prices. For the following two questions, consider the following binary classification problem: Whether the price of a house is higher than $500k or not. Lot size (sq ft) ≤ 9000 Built year ≤ 1980 > 1980 > 9000 Tree 1 Lot size (sq ft) ≤ 9000 Built year ≤ 1973 > 1983 > 1973 and ≤ 1983 > 9000 Tree 2 14. How many points in the training data are correctly classified by tree 1, assuming the leaf nodes have the best-possible uniform answer in them? (a) 3 (b) 4 (c) 5 (d) 6 (e) 7 15. How many points in the training data are correctly classified by tree 2, assuming the leaf nodes have the best-possible uniform answer in them? (a) 3 (b) 4 (c) 5 (d) 6 (e) 7 7 Free Response Question 1 (3 points) Given an impurity measure QT (m) for the leaf m of tree T , let’s define the tree comparison measure as CT = (∑ m QT (m) ) + λ|T |, where λ is a regularization paramter and |T | measures the number of leaf nodes. If we use Gini index as impurity measure, find the value of λ such that these two trees are equally good. (Gini Index for leaf m with K classes: ∑K k=1 pˆmk(1− pˆmk)) 8 Boosting We would like to use boosting with decision stumps to classify the house prices. Given a set of weak classifiers H, in which each weak classifier is a mapping from x ∈ RD to y ∈ {+1,−1} , we would like to construct a strong classifier as h(x) = sign [ T−1∑ t=0 βtht(x) ] , where ht(x) ∈ H for t = 0, 1, · · · , T − 1 and sign(a) = { 1 a ≥ 0 −1 a < 0 . For the remaining questions in this section, consider the following binary classification problem: Whether the price of a house is higher than $400k or not. The scatter plot of houses is shown in Figure 1. Note that any dimension (other than price, obviously) from Table 1 is a possible dimension for our boosting algorithm. Figure 1: Scatter plot of houses with an example decision stump What are the first two decision stumps? Note the example included an example stump; your stumps should be from an empty start. The example is on attribute “built year” and is between houses A and C Decision stump example as shown in Figure 1 16. On which attribute is decision stump 1 based? (a) lot (b) built year 17. Decision stump 1 is between two houses. Which ones? (a) A and B (b) A and C (c) F and C (d) E and G (e) F and G 9 18. On which attribute is decision stump 2 based? (a) lot (b) built year 19. Decision stump 2 is between two houses. Which ones? (a) A and B (b) A and C (c) F and C (d) E and G (e) F and G Free Response Question 2 (3 points) What is the minimum number of decision stumps needed for boosting if we would like to classify the given samples perfectly? Why? 10 Linear Regression with input dependent noise (5 points) In linear regression, we have a model in which output is linearly dependent on input with respect to param- eters. y = wTx+ We also saw that if noise is a normal distribution (N (0, σ2)), maximizing likelihood leads to the sum of squares loss formulation. Here, we will consider a simple case where y and x are scalars and x, y ∈ R but the noise is dependent on input such that ∼ N (0, σ2x2) i.e. standard deviation scales linearly with the input. We observe N independent training samples denoted as D = {(xi, yi)}Ni=1. Model is y = wx+ Information: x ∼ N (µ, σ2) =⇒ p(x) = 1√ 2piσ exp{− (x−µ)22σ2 } 3.1 Write an expression for likelihood of data p(D|w) in terms of p(x), σ, xi, yi, w. 11 3.2 Find the maximum likelihood estimate wML of weights w. 3.3 Assuming that the prior distribution of w is N (0, α2), express the posterior distribution of w, i.e. p(w|D) in terms of p(x), σ, α, xi, yi, w. 12 3.4 Find the MAP estimate wMAP of weights w. 3.5 Show that MAP estimate (wMAP ) is biased whereas ML estimate (wML) is not. Information: An estimate wˆ is biased if E[wˆ] 6= w 13