Page 1 of 6 Academic Year 2018-19 Examination Period Semester 2 Module Code: EL3300 Module CRN: 47788 Exam Type: Examination Module Title: Machine Intelligence Module Leader: Carl Berry Examination Paper Structure There are 5 questions in total. There are no appendices. The mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: answer books Instructions to Students: The time allowed to complete this examination paper is 2 hrs Answer 3 questions. The use of calculators is permitted in this examination. The use of dictionaries is not permitted in this examination. This question paper must not be removed from the examination room. Page 2 of 6 Question 1) a) Consider the table of costs of moving from one point to another below : From\To A B C D E F G H I J K L A – 10 – 5 – – – – – – – – B – – 5 – – – – – – – – – C – – – – – – 15 – – – – – D – – – – 8 – – – – – 25 – E – – – – – 9 – – – 5 – – F – – – – – – – – 14 – – – G – – – – – – – 20 – – – – H – – – – – – – – – – – 15 I – – – – – – – – – – – 15 J – – – – – – – – – – – 10 K – – – – – – – – – 30 – – L – – – – – – – – – – – – Where “–“ indicates no path exists between the 2 points in that direction. Given these costs draw the digraph associated with the above table. (6 Marks) b) Using the costs from the table in part a and the distances from point L in the table below use an A* search algorithm to find the best path from A to L. Point Distance From L Point Distance From L A 100 G 95 B 95 H 50 C 90 I 40 D 95 J 70 E 90 K 60 F 90 L 0 (12 Marks) c) How does the M* search differ from the A* search? Under what circumstances might it be more beneficial, what additional problems does it consider and how does it deal with these? (7 Marks) Page 3 of 6 Question 2) a) The following is a high level overview of the backpropagation training algorithm as used in a multi-layered perceptron network (MLP), assuming the MLP has a single hidden layer: • Step 1 : Initialize weights. • Step 2 : While stopping condition is false do steps 3-8. • Step 3 : Input nodes send signals down connections to hidden layer. • Step 4 : Each hidden node sums its inputs, applies its activation function and sends the signal to the next layer. • Step 5 : Each output node sums its inputs and uses its activation function to compute its output. • Step 6 : Each output node receives a target value (supervised training) and compares this with its actual output along with the derivative of the activation function to calculate its weight correction term. • Step 7 : ????? • Step 8 : ????? • Step 9 : Test stopping condition. i) Briefly explain a common strategy that is adopted for step 1 and why this approach is taken. (3 Marks) ii) What are the missing operations at steps 7 and 8 ? (NB this is a high level overview , your answer does not require any equations.) (7 Marks) iii) Part of step 6 states “Each output node receives a target value (supervised training) and compares this with its actual output…”. How does this part in a MLP differ from the training of a standard perceptron network ? (2 Marks) iv) Give three likely stopping conditions that may be testing for in step 9, state how these different criteria select the most appropriate network once the criteria is met and whether this requires any additional tracking on behalf of the system. (9 Marks) b) The backpropagation learning algorithm is an example of a greedy algorithm, what does this mean, what problem can it cause and what typical strategies do MLPs use to avoid the problem. (4 Marks) Page 4 of 6 Question 3) a) The table below shows a number of attributes of mobile robots with faults. ID TempSensor VibSensor Tracks Engine Fault 001 High Low N Electric 1 002 High Low Y Petrol 1 003 Low High Y Electric 2 004 High High Y Petrol 3 005 Low High N Petrol 2 006 Low High Y Petrol 3 007 High Low N Electric 1 008 Low Low Y Petrol 3 In order to predict what fault will occur within a robot a set of rules have been created : R1 : (TempSensor = High) AND (VibSensor = Low) -> 1 R2 : (TempSenosr = Low) AND (VibSensor = High) -> 2 R3 : (Tracks = Y) AND (Engine=Petrol) -> 3 i) These rules are neither mutually exclusive and exhaustive. Explain why in both cases providing an example. (4 Marks) ii) What are the possible problems of the rule set not being mutually exclusive and exhaustive and how could they be solved ? (6 Marks) b) There are a number of ways in which we can measure the effectiveness of rules in a rule set, two such metrics are precision and recall. Explain what these metrics represent and give an example of when they might be considered more important than an accuracy measure. (4 Marks) c) In the labs we have used Weka to look at various machine learning algorithms, one of the very useful metrics that Weka reports back for some classification algorithms is the confusion matrix. By means of a suitable 3 class example explain what a confusion matrix is and why it is particularly useful when choosing how to improve the algorithm. (11 Marks) Page 5 of 6 Question 4) a) The network below is a Hebb Network : If the weights and bias are initially set to 0, and the weight change is given by : ΔWi = Xi Y ΔB = Y Where i = 1,2. Show that a Hebb network can learn an OR gate after a single epoch. (14 Marks) b) In many ways Hebb Nets and a standard perceptron network are very similar, what important difference was made to the perceptron’s training algorithm that makes it different to a Hebb net ? (2 Marks) c) Explain why neither a Hebb net nor a standard perceptron network can learn to simulate an XOR gate, and state what changes where made to multi-layered perceptrons so that they could. (4 Marks) d) An even earlier example of Artificial Neural Networks (ANNs), the McCulloch-Pitt Neuron, can represent an XOR gate. Why were these types of networks not used instead of Hebb nets and Perceptrons ? (2 Marks) e) For input values to ANNs, and many other machine learning algorithms, it is common to use either binary inputs or to normalise the inputs. What is meant by the term “normalise” in this context, how is it done and why are these approaches taken ? (3 Marks) Page 6 of 6 Question 5) a) Explain the difference between supervised, unsupervised and semi supervised training and briefly explain how it affects the learning of an algorithm. (6 Marks) b) What advantages does semi supervised training have over supervised training and why is this becoming more common place ? (3 Marks) c) Optimisation strategies such as Genetic Algorithms and Particle Swarm Optimisers aim to find a “near optimal” solution or one that is “good enough”. Why can these methods not guarantee to have found the best solution ? (2 Marks) d) In terms of machine learning many algorithms are affected by “the curse of dimensionality”. What is this and what effect does it have ? (3 Marks) e) A common training strategy is a K cross folds validation strategy, where K is quite often set to 10. Explain how this strategy works. (7 Marks) f) How does the “leave one out” strategy differ from this ? What are the advantages and the disadvantages of a “leave one out” training strategy ? (4 Marks) End of Question Paper.