辅导案例-CSC1001

1 CSC1001: Introduction to Computer Science Programming Methodology Team Project Project description: This project will worth 20% of the final grade. This is a team project. Every group should have 2-3 students (more than 3 members are not allowed). You can choose one of the datasets provided at the end of this document as the target of your project. You can try out all the possible approaches that meet the requirements of the dataset. Decision tree is the baseline, you should write codes to implement decision tree without using third-party packages. Other approaches are also welcome, but decision tree should be implemented first and other approaches also need to be implemented without relying on third-party packages. We suggest that you implement Classification and Regression Tree (CART) to solve this problem. Details of CART can be found in references [1-4]. This team project will be assessed based on the source code and a project report. You should write your code for your project in .py files, and give a program entry file named main.py. Please pack your report and source code into a single .zip file, name it using the student IDs of all group members (e.g. if the student IDs of your group members are 119000000 and 119000001, then the file should be named as 119000000_119000001.zip), and then submit the .zip file via BlackBoard. Any wrong format and naming method will not be accepted. For the report: The report can be written by using LaTeX, MS Word or Markdown, and the format can be .doc, .docx or .pdf. There is no minimum requirement on the length of your report, but you report should include the following contents: 1. Description of the data and the question/s that you are interested in answering. 2. Review of some of the approaches that you tried or thought about trying. 3. Summary of the final approach you used and why you chose that approach. 4. Explain the working principle and logic of the approach used. 5. Summary of the results. 6. Conclusions. 7. The report needs to clearly state the contribution and workload of each member. For the source code: 1. The source code needs to be placed in a folder named ‘src’. 2 2. The source code needs to be properly commented. 3. A plain text file called ‘readme.txt’ needs to be given to explain how to use the source code. The text file should be included in the ‘src’ folder as well. For the dataset: Files in folder 1. The ‘train.csv’ file contains the training data. Please use training data to build your model(s). 2. The ‘test.csv’ file contains testing data. This data is used to test the performance of your final model. Don’t use it until your model is well trained and fixed. You should only use the test data once! 3. The ‘readme.txt’ file contains the description of the dataset. 4. You should use the training data (train.csv) to select and adjust models. You can also split ‘train.csv’ into two parts: one for model training, one for validation. Once your model is determined, use the test data (test.csv) to obtain the results. 5. The basics of machine learning can be found in [] as references. The report and source code must be completed independently by the group members, plagiarism, inappropriate discussion, etc. are strictly prohibited. Please note that, teaching assistants may ask you to explain the meaning of your program, to ensure that the codes are indeed written by your group. Please also note that we may check whether your program is too similar to your fellow students’ code using Blackboard. This team project is due on 5:00PM, 17 May (Sunday). For each day of late submission, you will lose 10% of your mark in this project. If you submit more than three days later than the deadline, you will receive zero in this assignment. Dataset 1: Google Play Store Apps You can choose one of the following tasks: 1. Classification: predict whether the user rating of the app is above 4.5 or not. Evaluation: please report accuracy = 2. Regression: predict the user rating of the app. Evaluation: Report Mean Square Error (MSE) = 18(_ − _)!”#$% Dataset 2: Red Wine Quality You can choose one of the following tasks: 1. Classification: predict whether the quality score of the wine is above 6 or not. 3 Evaluation: please report accuracy = 2. Regression: predict the quality score of the wine. Evaluation: Report Mean Square Error (MSE) = 18(_ − _)!”#$% References [1] https://en.wikipedia.org/wiki/Decision_tree_learning [2] http://www.stats.ox.ac.uk/~flaxman/HT17_lecture13.pdf [3] http://www.math.snu.ac.kr/~hichoi/machinelearning/lecturenotes/CART.pdf [4] Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. CRC press. [5] Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier. [6] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, pp. 3-7). New York: springer. [7] Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.

辅导案例-CSC1001

Related

Previous Post辅导案例-INFO20003-Assignment 2

Next Post辅导案例-12SMM/MATH

Author admin