辅导案例-ECE 175A

ECE 175A — FINAL EXAM WINTER 2020 — 100 Points Total Due 5pm PST, Friday, March 20, 2020 [50 pts total] 1. Figure 1 shows a situation where two thermometers provide independent measurements of the scalar equilibrium temperature of a material or liquid surface. y1
X
=
equilibrium

temperature
n1
n2
thermometers
g1
g2
y1
=
g1
x
+
n1
y2
=
g2
x
+
n2
y2
Figure 1: Equilibrium temperature of a surface measured using two independent thermometers. Assume that g1 = g2 = 1 and the noise variances σ 2 1 and σ 2 2 are known. This corresponds to the two temperature sensors being well calibrated prior to deployment. In this case, each thermometer is assumed to provide a separate, independent scalar measurement of the unknown, deterministic temperature x according to Yi = x+Ni and Ni ∼ N(n; 0, σ2i ) for i = 1, 2, where the sensor noises N1 and N2 are independent and the variances σ 2 1 and σ 2 2 are known. Realization values are denoted by y1 = Y1, y2 = Y2, n1 = N1 and n2 = N2. [10 pts] (a) Place the two scalar measurement equations into the vector Gaussian Linear Model (GLM) form, Y = Ax+N, N ∼ N(n; 0, C), giving the dimensions of the realization values y = Y and n = N , the dimensions of x, A, and C, and the values of the elements of Y , A and C. [10 pts] (b) Prove that the problem of finding the maximum likelihood estimate (MLE) of the actual temper- ature x given the observed vector y is equivalent to solving the minimization problem, x̂mle = arg min x ‖y −Ax‖2C−1 [10 pts] (c) Determine the maximum likelihood estimate, x̂mle, of the temperature x. In particular, show that the MLE can be written as x̂mle(y) = α1 y1 + α2 y2 with αi ≥ 0, i = 1, 2, and α1 + α2 = 1. Explicitly give the values of αi, i = 1, 2, in terms of the GLM model parameters, and show that your derived values are nonnegative and indeed satisfy the condition α1 + α2 = 1. 1 [10 pts] (d) Assuming that the value of σ22 is fixed, determine the ML estimate x̂mle first in the limit that σ21 → 0 and then in the limit that σ21 →∞. Explain why this behavior makes intuitive sense. [10 pts] (e) Suppose the thermometers are two independent sensors on a space probe deployed to a moon of Jupiter and they are to be used to determine if the temperature of a methane lake is either x = τ(1) or x = τ(2) because a well-accepted physical model says that the temperature of the lake must take one of these two values, and no other. These thermometers have been calibrated to ensure that σ21 = σ 2 2 = σ 2. Scientist have no a priori reason to prefer one model over the other, so we also assume that the two possibilities are a priori equally likely, P (1) = P (2). Note that we are now taking the temperature to be a two-valued random variable X with realization values X = x ∈ {τ(1), τ(2)}. We make the reasonable assumption that the sensor noises are independent of the methane lake temperature, N ⊥⊥ X. Given a single measurement of the observation vector y determine the optimal 0/1–loss Bayes decision rule (BDR) in the form g(y) = ωT (y − y0) 1 ≷ 2 0 for deciding whether x = τ(1) or x = τ(2). Be sure to give the values of the parameter vectors ω and y0 in terms of the model parameters. Given an interpretation of the above decision function in terms of a separating hyperplane. [50 pts total] 2. Figure 2 illustrates a data processing procedure in statistical learning, denoted as whitening. Given data sampled from a random vector X taking realization values X = x ∈ Rn and with covariance Cov(X) = Σx, this consists of finding the “whitening” transformation Y = TX such that the random vector Y, taking realization values Y = y ∈ Rn, has identity covariance Cov(Y) = Σy = I = identity matrix. Figure 2: The whitening procedure. [10 pts] (a) Let mx = E{X} and my = E{Y}. First show that Σy = TΣxTT . Next, determine an equation for the whitening transformation T that will result in Σy = I using the principal component analysis (PCA) decomposition of X.1 1Recall that if D = diag(d1, · · · , dn) is an arbitrary full rank matrix then Dα = diag(dα1 , · · · , dαn) for any positive or negative rational number α. For a diagonal matrix D it is also the case the DαDβ = Dα+β . Thus D = D 1 2 D 1 2 , I = D0 = D1D−1 = DD−1, I = D− 1 3 D 1 3 , etc. 2 [10 pts] (b) Assume that the covariance of X has principal components φ1 = √ 2 2 (1, 1) T and φ2 = √ 2 2 (−1, 1)T with associated principle values σ21 = 2 and σ 2 2 = 8. Determine the whitening transformation matrix T. [10 pts] (c) A generalization of the whitening procedure is to apply a transformation G such that Y = GX has an arbitrary covariance matrix Cov(Y) = Σy. Since this presumes that the eigenvalues, also known as the spectral values or spectrum, can be arbitrarily assigned, this generalization is known as spectral shaping. Determine an equation for the transformation G based on the PCA decompositions Σx and Σy. Hint: You already have a transformation Σx T→ I; now determine a transformation I P→ Σy so that Σx G=PT−−−−→ Σy. [10 pts] (d) Suppose that we want the spectrally shaped covariance Σy to have principal components u1 = 1 2 (1, √ 3)T and u2 = 1 2 (− √ 3, 1)T and principal values γ21 = 3 and γ 2 2 = 12. Determine the transformation matrix G. [10 pts] (e) Joe has an idea to create a company, DogFind Inc., to help people buy dogs with faces that match their owners faces.2 DogFind contracted with many dog breeders in the country and maintains a database of pictures of the faces of a large number of dogs, including thoses that are currently for sale. Let’s call these dog face images yi, i = 1, . . . , n, and view them as instantiations of a random vector Y. DogFind also has obtained a large database of pictures of people’s faces. Let’s call this xj , j = 1, . . . ,m and view this as a collection of instantiations of a random vector X. Joe believes that from these datasets he can estimate a useful distance function D(y,x) between dog and people faces (let’s call this “Joe’s distance function”), and from it implement a nearest neighbor pet-assignment-algorithm as follows: given a picture x of a (human) customer, the dog that is likely to have the most similar face is selected according to y∗(x) = arg min yi∈A D(yi,x). where A is the set of all dogs that are currently available for sale and D(·, ·) is a distance function proposed by Joe to be further discussed below. This dog is then recommended to the (human) customer. Let’s step through how Joe thinks this could be done. i. First Joe wants to estimate the covariances Σx and Σy using his databases. Give the standard MLE form of these estimates that are used when X and Y are assumed to be Gaussian. (Note that you do NOT have to derive these formulas, just write them down.) ii. Next Joe wants to transform any human face image x (that lives in “human-face space”) into “dog-face space” using y = Gx where G is the spectral shaping transformation Y = GX that takes Σx to Σy. Explain in words how you would determine this transformation once you’ve computed estimates of the covariance matrices Σx and Σy. iii. Joe makes the highly simplifying assumption that Y ∼ N(y; my,Σy). Note that dog faces and human faces are independent, Y ⊥⊥ X, so that Y ⊥⊥ GX. Thus, conditioned on X = x, the random variable Y−Gx is conditionally Gaussian with conditional mean my−Gmx and covariance Σy. Joe makes the further simplifying assumption that my −Gmx ≈ 0. (Yes, Joe is an optimist!) Then Y −Gx is conditionally Gaussian with covariance Σy, which is equivalent to saying that Y is conditionally Gaussian with mean Gx and covariance Σy, P (y|x) = N(y; Gx,Σy). Given this poster distribution of y given x, a rational estimate is the Maximum A Posteriori or MAP estimate as this can be related to a zero-one loss for estimating the correct value of a continuous random vector Y, y∗(x) = arg max y P (y|x). 2Thi
s problem was suggested by Prof. Nuno Vasconcelos. 3 It is also the case (proven in advanced courses on Machine Learning) that this provides the optimal estimator that solves the problem, y∗(·) = arg min f(·) E{‖Y − f(X)‖1} where ‖z‖1 denotes the one-norm, ‖z‖1 = d∑ 1=1 |zi|. However, Joe can’t optimize over the space of all possible dogs, but only over the discrete set of purchasable dogs, A. Thus, practically Joe determines an estimate for the “best dog” via the optimization y∗(x) = arg max yi∈A P (y|x) using the approximation P (y|x) = N(y; Gx,Σy), with Σy and G approximately determined in the manner discussed above. Given Joe’s model, determine a distance function D(yi,x) for which y∗(x) = arg min yi∈A D(yi,x) = arg max yi∈A P (y|x) and relate this to the Mahalanobis distance. iv. Joe further makes the claim that instead of using his distance function D(·, ·), one can trans- form both y and x into the same whitened space and then apply the regular Euclidean distance to determine the dog yi which is closest to xi. Determine if Joe is correct. MATH FACTS “Difference of Two Squares” Vector-Matrix Identity M = MT =⇒ aTM a− bTM b = (a− b)TM (a+ b) The Normal Distribution Recall that the normal distribution for a random variable/vector X, with realization value X = x ∈ Rn, n ≥ 1, is given by N(x;µ,Σ) = 1√ (2pi)n det Σ e− 1 2‖x−µ‖2Σ−1 where ‖x− µ‖2Σ−1 = (x− µ)TΣ−1(x− µ) = d2(x, µ) and d(x, µ) = Mahalanobis distance between x and µ. Note that n = 1 gives the standard scalar normal distribution by setting Σ = σ2. 4 Vector Derivative Properties Let `(·) : Rn → R and f(·) : Rn → Rm. Let x = (x1, · · · , xn)T ∈ Rn and d ∈ Rn. We have ∂ ∂x `(x) def = ( ∂ ∂x1 `(x), · · · , ∂ ∂xn `(x) ) (1× n row vector) ∇`(x) = ∇x`(x) def= ( ∂ ∂x `(x) )T (n× 1 column vector) Jf (x) = ∂ ∂x f(x) def =  ∂ ∂xf1(x) … ∂ ∂xfm(x)  (m× n Jacobian matrix) H(x) = ∂ 2 ∂2x `(x) def = ∂ ∂x ∇x`(x) = ∂ ∂x ( ∂ ∂x `(x) )T (n× n hessian matrix) `(x+d) = `(x)+dT∇`(x)+1 2 dTH(x)d+higher order terms (Taylor Series Expansion about x) ∂ ∂x cTx = cT , ∂ ∂x Ax = A , ∂ ∂x xTAx = xTA+ xTAT A = AT =⇒ ∂ ∂x xTAx = 2xTA , ∂ ∂x f(y(x)) = ∂f(y) ∂y ∂y(x) ∂x ∂ ∂x fT (x)g(x) = fT (x) ∂g(x) ∂x + gT (x) ∂f(x) ∂x 5

辅导案例-ECE 175A

Related

Previous Post辅导案例-INFS1200/7900-Assignment 1

Next Post辅导案例-ELEC2602 1

Author admin