辅导案例-COMP0088 1

COMP0088 1 TURN OVER Alternative assessment: Introduction to Machine Learning, COMP0088 Main Summer Examination period, 2019/20 Suitable for Cohorts: 2019/20, 2018/19, 2017/18 Distribution of Marks: 22: Support Vector Machines 10: Loss functions and margins 20: Loss-based learning and regularization 8: Neural Network training 40: Computational assignment 100: Total Marks for each part of each question are indicated in square brackets. There are NINE questions in total. Answer all questions. For all True/False questions: always support your reply with a short answer, using at most three sentences, or approximately 10-50 words in total (whatever suits you best). You may find it easier to reply using equations – in that case there is no sentence, or word count. If you do not provide a justification, your answer will not be taken into consideration, whether true or false. Calculators are permitted. [7 marks] b. Suppose K1(x,x′) = ψ1(x)Tψ1(x′) and K2(x,x′) = ψ2(x)Tψ2(x′), where ψ1(x) ∈ RD,ψ2(x)∈ RK are vectors of different dimensions. Let KP be the the product kernel KP(x,x′) = K1(x,x′)K2(x,x′). Indicate the dimension and the expression of a feature map ψP such that KP(x,x′) = ψP(x)TψP(x′). [15 marks] Loss functions and margins [10 Marks] 2. Consider a point that is correctly classified and distant from the decision boundary. Why would the SVM’s decision boundary be unaffected by this point, but the one learned by logistic regression be affected? Argue mathematically in terms of the loss functions used to train the two different models, and how they penalize the margin associated with the point in question. Specify what exactly ’distant’ means in the claim above. [10 marks] COMP0088 2 CONTINUED 1. Kernels can be composed by using addition and multiplication, yielding new Kernels that satisfy the Mercer condition. This means that the sum of two kernels is a kernel, and the product of two kernels is a kernel. You are asked to show this by expressing the resulting Kernels in terms of the inner-products of the original Kernels a. Suppose K1(x,x′) = ψ1(x)Tψ1(x′) and K2(x,x′) = ψ2(x)Tψ2(x′). Let KS be the sum kernel, Ks(x,x′) = K1(x,x′)+K2(x,x′). Find a feature map ψS such that Ks(x,x′) = ψS(x)TψS(x′). Support Vector Machines [22 Marks] Loss-based learning and regularization [20 Marks] 3. You are provided with a training set X = {(x1,y1),(x2,y2), . . . ,(xN ,yN)}. Your training objective is expressed as the weighted sum of a data-dependent term, D(w), and an `2 regularization term, R(w): Jλ(w) = 1 N N ∑ i=1 l(xi,yi)︸︷︷︸ D(w) +λ‖w‖22︸︷︷︸ R(w) (1) The data-dependent term penalizes the deviation of the model predictions from the target values on the training set, e.g. for least-squares prediction we have l(xi,yi) = (yi−wT xi)2. For a given value of λ we denote by w∗λ the optimum of the associated optimization problem: w∗λ = argminw Jλ(w) (2) We assume that we can always compute the global optimum of Jλ(w) with respect to w – this is indeed the case for the case of linear regression, logistic regression, or SVM training. We are interested in understanding how the learned parameter vector and associated cost terms are affected by changes in λ. You are asked to indicate which of the following statements is true of false, while justifying your answers (please consult the instructions at beginning of this test – if you do not provide a valid justification, your answer will not be taken into consideration). a. Decreasing λ will result in an decrease of R(w∗λ). [5 marks] b. Decreasing λ will result in an decrease of D(w∗λ). COMP0088 3 TURN OVER [5 marks] c. Decreasing λ will result in an decrease of J(w∗λ). Here, rather than a verbal justifi- cation, you need to prove this mathematically. [5 marks] d. Increasing λ always improves generalization performance. [5 marks] Neural Network training [8 Marks] 4. Consider the task of setting hyperparameters for a neural network; in particular adjusting the learning rate, momentum and minibatch size for SGD, weight decay, dropout, number of layers and number of neurons per layer. Recall that hyperparameters are often tuned using a validation set. a. (a) Among the above hyperparameters, indicate for which it is possible to tune on the training set (rather than a validation set). Briefly explain your answer. [3 marks] b. (b) Indicate which hyperparameters should be tuned on a validation set, rather than the training set. Briefly explain your answer. [5 marks] COMP0088 4 CONTINUED Computational Assignment [40 Marks] Please follow the instructions of the jupyter notebook accompanying this exam and return your answers to this part in the form of a jupyter notebook. The assignment has two parts, comprising in turn different questions. Their contributions to the total mark is detailed below. 0.1 Part A: Autoencoders [25 Marks] Question 1 Autoencoder-based PCA: [10 marks] Question 2 2-layer autoencoder: [5 marks] Question 3 3-layer autoencoder: [10 marks] 0.2 Part B: Latent space-based synthesis [15 Marks] Question 1 Autoencoder-based PCA: [10 marks] Question 2 2-layer autoencoder: [5 marks] [Total 100 marks] END OF PAPER COMP0088 5

辅导案例-COMP0088 1

Related

Previous Post辅导案例-COMP0081

Next Post辅导案例-MSCI 564

Author admin