Skip to main content
留学咨询

辅导案例-CSCI 540:

By May 15, 2020No Comments

CSCI 540: Machine Learning Spring 2020 Project#1: Anatomy of a Model Total 100 points Out: 1/14/2020 Due: 1/21/2020@11:59:59pm Goal Students will reinforce concepts concerning the key aspects of a machine learning model, namely, the hypothesis, the learning algorithm, the target function, and sample data. Students will also familiarize themselves with the MATLAB computing environment. Tasks In class, we discussed the anatomy of a learning model as consisting of a target function, data samples, hypothesis set, and learning algorithm. The hypothesis set consist of a family of functions taking on a specific form. It is the job of the learning algorithm to search through the hypothesis set for the hypothesis, h, that best approximates the target function, f. Upon finding the final best hypothesis, g, the learning algorithm halts. Assuming hypothesis set • line in the plane H1: h(x) = w2 x2 + w1x1 + w0 x0 For H1 we have x0=1 and ” ∈ ℝ for = 0,1,2 Each hypothesis in H1 is described by a parameter vector ++⃗ = -., /,01. By this definition, the kth hypothesis in H1 is described by a point in 3-deminsional parameter space. For example, parameter vector (1,2,3) would correspond to h(x) = 3×2 +2 x1 + 1 x0. Your hypotheses will be used to perform classification of a two-class problem using the classifier we discussed in class, namely ℎ() = (9) which returns +1 or -1 as a prediction. You are given a data set generator that creates and writes out synthetic data to a comma separated value (CSV) data file. This data consists of two dimensional features for a two-class problem, namely the positive class (+1) and the negative class (-1). The data generator is included on the course blackboard to demonstrate to you how the data is generated. For your assignment, do not change the parameters of the data generator and do not change the number of data points. A data file has been provided. The data file was created by the data generator MATLAB code. The data generator is provided to you for pedantic reasons. Please note that in general, you typically have no knowledge about the target function and have no control over the data samples as we have discussed in lecture. Each data point consists of two features and a class label. In the CSV file, the first column contains the first feature, the second column contains the second feature, and the third column contains the third feature. Each datum from the CSV file is represented as a vector ⃗” = (“,/, “,0), that is the first feature for the ith vector is “,/ and the second feature for the ith vector is “,0. Remember that a model is constructed using a dummy variable x0=1. Using a representation for the parameter vectors for each hypothesis set, devise a scheme for finding the “best” model for the synthetic data set. This will involve 1. In MATLAB, devising a 2-dimensional parameter space representation for H1. 2. In MATLAB devise a method for initializing the starting hypothesis in H1. 3. In MATLAB devise a method for evaluating or computing the error for a hypothesis in H1. Error corresponds to the number of data points for which the classifier is incorrect. 4. In MATLAB, using your evaluation method, devise a method for searching through hypotheses in H1 for the best hypothesis. You will run your search for a fixed maximum number of iterations. 5. Plot the data set along with the final best hypothesis Note: You may use any search method of your choosing. You are well served to keep it simple. Questions: Written as PDF or MS-Word Only 1. Discuss what approach you took to search hypothesis space for H1 2. Does your best hypothesis from H1 change when you re-run your learning algorithm? Why or why not? Submission 1. Create a single ZIP archive (no tar, gz, rar, or 7-zip) consisting of your written work and your MATLAB Code. Your MATLAB code MUST include everything needed to run your submission including data file(s). Written work must be PDF or MS-Word only. Other formats not accepted and will receive a zero. 2. Test your submission by unzipping your code and verifying that it runs. Code that does not run will not be graded and you will receive a zero for it. 3. Submit your assignment using BLACKBOARD only! Email submissions will not be accepted and you will receive a zero.

admin

Author admin

More posts by admin