Skip to main content
留学咨询

辅导案例-IT270

By May 15, 2020No Comments

Midterm #1 (ID:161) IT270 ALEXANDER PELAEZ Instructions. Please make sure if you use R you copy and paste it into Word using Courier Font (makes it easier to Read). For each of the problems that are looking for a response (not just a calculation), be sure to explain and interpret the results. If you aren’t sure…ASK PLEASE. Please start each question on a new page and clearly label that start of each problem (Maybe slightly larger font, bold face , underline… ) anything that will help me find the problem you are working on. As a midterm you are to work on your problems individually. However, you may discuss techniques and approaches. You may not copy code or answers and copying other students code and answers could result in major penalty or even failure on the exam. 1. Matrix Algebra Given the following Matrices (10pts) A​= 8 3 B​= 4 1 C​= 2 5 4 6 2 5 3 3 3 D​= 4 1/3 9 13 12 1/3 1/4 E 4 3 10 4 6 12 ∑​= 21 34 8 7 15 12 8 12 11 Calculate the following: DO NOT USE R 1) BA 2) B’E 3) Find the determinant of​ E 4) (AA)A’ 5) Find the trace of matrix ​∑, what does the trace of a covariance matrix represent? 6) Compute the correlation matrix ​⍴​ from the covariance matrix ​∑. 7) Compute the eigenvalues and eigenvectors of the covariance matrix​ ∑. 8) Prove using the matrix above AA​-1 ​= I 2. Principal Component Analysis The head of an airport is looking to determine issues related to efficiencies and operations at the airport. However, they are uncertain where to look. Given the dataset “airport_cancellations.csv” and “airport_operations.csv” conduct the following analysis. a) First you will need to merge this data together – see the merge function in R to create a combined dataset. Consider what you will need to merge the data on. What happened after you merged the data? Anything interesting? Why? b) Are there any outliers or missing observations that need to be dealt with? How will you deal with them. c) Conduct a principal component analysis. After the PCA how many dimensions are necessary? Explain how you determined this? d) Are there any overlapping loadings? Explain what you will do with the overlapping loadings. e) Name the dimensions you are left with. f) Is this analysis reasonable? Are there any issues with your final dimension list? g) Create new variables in your dataset and compute values for each PCA dimension. The conduct a correlation analysis between the variables and each dimension. Why is this interesting? h) Is there a correlation between the PCA values? Is this surprising? 3. Factor Analysis It is a well known fact that sports analytics are very popular. The dataset fifa.csv contains information about soccer (football) players obtained from FIFA19 information. It would be interesting if the ratings that are used could be analyzed as a simple set of factors. a) Identify the columns that are the best candidates for analysis as factors. b) Conduct a factor analysis using these columns (if you get an error reduce the number of factors in the function, until the error disappears). c) Justify your use of the rotation method. What does it tell you ? d) Are there any correlations between the factors? e) Create the diagram (by hand or powerpoint) of the factors, be sure to label everything. f) Split the data set into two parts (Left footed players and right footed players). Conduct a factor analysis and explain if you see any differences between right and left footed players . Note you do not need to draw this one. 4. K-nearest A supervisor wishes to conduct a classification analysis on the breast_cancer.csv dataset to see if new observations can be properly classified. a) Examine the dataset and explain why KNN might be used, discuss the benefits of this algorithm, and discuss the drawbacks. b) Conduct a correlation plot of the relevant variables , and make an initial assessment of the relationships between each of the variables and the classifier. c) Conduct a KNN classification using all of the relevant columns i) Discuss your initial approach including how you will set this up and what the steps are including number of variables in your training set and number of variables in your test set. ii) Provide measure of accuracy (since this is a straight classification you only need a simple assessment). iii) Reduce the number of variables from the original 10. How would you decide which variables might be better to use in this assessment. iv) Conduct KNN analysis using these variables, and determine what would be a reasonable KNN model considering the lowest number of columns needed, with the most reasonable accuracy. Defend your position. 5. Decision Tree People Analytics is becoming a more popular and demanding area for Data Mining. The Head of Human Resources is looking to identify reasons for attrition (people leaving either voluntary or otherwise). The task is to develop a decision tree to determine this. You will need to merge three datasets (employee_survey_data.csv, general_data.csv, manager_survey_data.csv) to accomplish this task. a) Examine the dataset and determine if there are any issues. Produce histograms and summary stats for the appropriate columns and identify any issues. b) Determine and list any columns that are not needed in a first “full” model. c) After examining the data, which methods (algorithms) do you think are appropriate d) Develop the decision tree model and compare the methods based on accuracy. e) Provide a chart of your final model. 6. Theoretical Problems a) Explain why the covariance matrix is important for analysis. Additionally, explain what the eigenvectors and eigenvalues are and why they are important. b) In machine learning, explain the importance of splitting up the dataset. What are the different ways to split and how should an analyst split the data. c) What are the differences and similarities between factor analysis and PCA. Focus on the equations that are produced. d) What are some of the challenges with KNN. How would you advise someone who decides to use KNN? e) What is the difference between orthogonal rotation and oblique rotations. When would we choose either, or neither.

admin

Author admin

More posts by admin