- August 29, 2020

CS4131 THE UNIVERSITY OF WARWICK LEVEL 7 Open Book Assessment 2 hours Department of Computer Science CS413 Image and Video Analysis Instructions 1. Read all instructions carefully and read through the entire paper before you start writing. 2. You should attempt 4 questions. You should NOT submit answers to more than the required number of questions. 3. All questions will carry the same number of marks. 4. You should handwrite your answers either with paper and pen or using an electronic device with a stylus (unless you have special arrangements for exams which allow the use of a computer). 5. Begin each question on a new page and clearly mark each page with the page number, your student ID and the question number. (a) Handwritten notes must be scanned or photographed and all individual solutions should (if possible) collated into a single PDF with pages in the correct order. (b) You must upload two files to the AEP: your PDF of solutions and a completed cover sheet. (c) You must click FINISH ASSESSMENT to complete the submission pro- cess. After you have done so you will not be able to upload anything further. 6. Please check the legibility of your final submission before uploading. It is your responsibility to ensure that your work can be read. – 1 – Continued CS4131 7. You are allowed to access module materials, notes, resources, references and the internet during the assessment. 8. You should not try to communicate with any other candidate during the assess- ment period or seek assistance from anyone else in completing your answers. The Computer Science Department expects the conduct of all students tak- ing this assessment to conform to the stated requirements. Measures will be in operation to check for possible misconduct. These will include the use of similarity detection tools and the right to require live interviews with selected students following the assessment. 9. By starting this assessment, you are declaring yourself fit to undertake it. You are expected to make a reasonable attempt at the assessment by answering the questions in the paper. Please note that: • You must have completed and uploaded your assessment before the 24 hour assessment window closes. • You have an additional 45 minutes beyond the stated length of the paper to allow for downloading and uploading the assessment, your files and technical delays. • For further details you should refer to the AEP documentation. Notify [email protected] as soon as possible if you cannot complete your assessment because: • you lose your internet connection; • your device fails; • you become unwell and are unable to continue; • you are affected by circumstances beyond your control (e.g. fire alarm). Please note that this is for notification purposes, it is not a help line. – 2 – Continued CS4131 1. This question is about the Human Visual System (HVS). (a) Sketch the anatomy of the eye and describe the structure of the retina with reference to colour perception. What image analysis is performed by the eye? Does it matter that the eye produces an up-side-down, left- to-right image of the world? What are retinotopic maps and what do they tell us about how the HVS processes information? In your answer, where appropriate, give specific examples of image processing operations performed by the eye and the brain. [10] (b) Describe in detail the visual pathway of the HVS. Giving an example, explain why perception is not simply a feed-forward process. What are the similarities between how we think the HVS works and how artificial neural networks are used to learn and perform visual tasks? [15] – 3 – Continued CS4131 2. (a) Giving definitions and simple examples, explain how a 1D Discrete Fourier Transform works. What is the relationship between filtering using convo- lution and filtering in the frequency domain? [6] (b) A 2D Discrete Cosine Transform can be derived from the 1D discrete projections using functions of the form: g(x, u) = α(u) cos [ pi 2N (2x+ 1)u ] , where α(u) is, α(u) = √ 1 N , u = 0√ 2 N , 1 ≤ u < N. Carefully explain this equation and how it can be used to perform de- composition of a 1D signal f [x], 0 ≤ x < N into DCT coefficients, F [u]. Giving reasons, explain how the signal can be synthesised given the DCT coefficients, F [u], 0 ≤ u < N . [5] What is the 2D form of the forward DCT expansion given the 1D form above? [3] (c) Explain why blocking artefacts can be observed in an image compressed by the standard JPEG technique at low bit rates. How might blocking artefacts be reduced? [6] (d) A webcam generates 8-bit monochrome video frames of resolution 320 × 240 pixels at the rate of 10 frames per second (FPS). Calculate the total number of bytes in the stream and the compression ratio achieved by a simple DCT coder which uses 8× 8 blocks using the quantisation scheme: 8 8 6 6 4 4 2 2 8 8 6 6 4 4 2 2 6 6 6 6 4 4 2 2 6 6 6 6 4 4 2 2 4 4 4 4 4 4 2 2 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 i. if all the coefficients of every block are encoded ii. if only frequency coefficients at 0 ≤ (u, v) ≤ 3 are encoded You can assume that no other type of compression is applied to the re- sulting encoded data stream. [5] - 4 - Continued CS4131 3. (a) What are the major problems for background subtraction algorithms? Give examples of video systems which might use background subtraction. [6] (b) A sequences of video frames, f(x, y, t), are being processed by a back- ground modelling method using a running-Gaussian model and a learning update rule: i. What parameters define the background model? For image frames with size 1280 × 720 using 64 bit floating point arithmetic, what is approximately the memory size of the model in bytes? [4] ii. Give the update equations for the model at frame t+ 1. [4] iii. If the foreground classifier uses a significance of K standard devia- tions, state the classifier rule for deciding if a frame pixel is foreground. [2] (c) The Stauffer-Grimson background model uses a mixture Gaussian distri- bution. What is the advantage of this over a running average or a single running-Gaussian model? [2] (d) A two-component GMM is used to model background with weights w1 = w2 = 0.5, means µ1 = 85, µ2 = 170 and variance σ 2 1 = σ 2 2 = 900. Explain how the Stauffer-Grimson algorithm will update the model parameters. If f(x, y, t+1) = 120, at some pixel (x, y), what are the parameters’ values at t+ 1 if the running average feedback weight is α = 0.1? [7] - 5 - Continued CS4131 4. (a) What is the output of applying the 1D filter kernel, h(x) = {−1, 4,−1}, using a convolution operation to the following image matrix? Explain how you deal with pixels on the boundary. 0 0 0 0 0 4 4 0 0 4 4 0 0 0 0 0 [3] (b) Describe how an estimate of edge orientation and edge strength can be produced by using a pair of convolution kernels, R1 = −1 0 0 1 , R2 = 0 −1 1 0 . [6] (c) A 2D Gaussian filter can be defined as g(x, y;σ) = 1 2piσ2 exp ( −x 2 + y2 2σ2 ) . i. What is the effect of changing σ when g is used on an image? What is a good size of kernel to use if say σ = 2? [4] ii. Show how g can be made into an edge detector for vertical edges. What does σ do in this case? [6] iii. The 2D LoG operator takes the form ∇g(x, y, σ) = − 1 piσ4 [ 1− x 2 + y2 2σ2 ] g(x, y;σ). Show that the 3× 3 kernel 1 1 1 1 -8 1 1 1 1 is a fairly reasonable approximation. [6] - 6 - Continued CS4131 5. (a) What are the properties of good visual features and why? [5] (b) What are key-points and what are feature descriptors? Given two features sets: P = {pi(x, y)} Q = {qj(x, y)} with M and N numbers of features, how can nearest-neighbour matching be used to find out if the P and Q contain the same object? [5] (c) The following expression calculates the homogeneous coordinates of image points for a pin-hole camera: xy 1 = f 0 0 00 f 0 0 0 0 1 0 X Y Z 1 Give a sketch of the geometry implied by this equation explaining the role of f . [3] (d) Using diagrams and equations, explain what are extrinsic and intrinsic parameters of a camera. Why is camera calibration useful in image and video analysis? [5] (e) The combined homogeneous camera matrix M with 11 unknown param- eters, m11 m12 m13 m14m21 m22 m23 m24 m31 m32 m33 1 , takes world coordinates, X, onto image coordinates x. Show that given a set of point pairs {Xi,xi}, the camera matrix can be solved using linear least squares. What is the minimum number of points required to obtain M? [7] - 7 - Continued CS4131 6. (a) A Perceptron takes two dimensional inputs, x, and produces scalar out- puts y and has the following design: X In the design, the activation function is linear: f(z) = z. To train the weights, a loss function, L(y, yˆ) = 1 2 (y − yˆ)2, is used. i. Perform a forward pass to calculate the output given the initial input and set of weights: x = ( 1 1 ) , w = 0.50.5 −2 What is the loss if the corresponding true value to the current input is yˆ = 1? [1] ii. Write down an expressions for ∂L ∂w and hence determine the propor- tional gradient step, ∆w which will reduce the loss, given the single sample pair {x, y}. What are the new weights at the second epoch if the learning rate is set to 1 4 ? [4] iii. What is the forward-pass and the weight update if the activation function is a ReLU? [4] (b) Explain how Conv and Max-Pooling layers work. Including biases, what is the total number of weights of a 2D Conv layer with 10, 3 by 3 filters if the input size is 28 x 28? [4] (The rest of this exam question continues on the next page.) - 8 - Continued CS4131 (c) Look carefully at the summary table description of a CNN intended for classification: i. Give an interpretation of what the network is likely to learn from images. What can be said about the feature classification capabilities of the fully connected part? What activation would you recommend for the output layer and why? [6] ii. This network is known to be overfitting on some data. Explain the phenomenon of overfitting and what strategies can be employed to prevent overfitting. [6] - 9 - End