Skip to main content

辅导案例-EXAM 2

By May 15, 2020No Comments

EXAM 2 MSCI:3250 SPRING 2020 • This exam comprises 4 questions and is worth 150 points • You will have 90 minutes to complete this exam: • Early submission bonus: Submissions before 10:05 PM will receive a 3 point bonus • Late penalty: Submissions after 10:15 PM will be deducted 3 points for each minute late • The exam is open-book, open-notes, and you can use the Internet, but no communication with other individuals (with the exception of the instructor) is allowed • By taking this exam, you agree to abide by the Tippie Honor Code and Honor Pledge below HONOR PLEDGE On my honor, I pledge that during this examination I have neither given nor received assistance, and that I did not have advanced knowledge of the exam content. Specifications • Submit an R script file with your codes for all questions o Name your file “lastname_exam2.R” o Add a comment with your name at the top of the file and comments denoting each question number o You may add other comments for clarification o Add the command rm(list=ls()) at the top of your file to clear the workspace • The solution for each question must be generated as R variables or plots with specific names as instructed o All solutions should be generated by running your codes without any customization or modification by the instructor o Load required packages with the library() command. Your script should not include any unnecessary packages or install() commands o Assume all input files are in the working directory. Do not include the setwd() command in your script Background April 20th has become known as “Weed Day”, prompting annual celebrations and rallies across the country. For this exam, we will analyze crime and demographic data from Denver, CO, where recreational marijuana has been decriminalized. Carefully review all provided files (“mj_crimes1.csv”, “mj_crimes2.csv”, and “neighborhoods.txt”) before beginning. Then answer the following questions: 1. (30 points) Read “mj_crimes1.csv” and “mj_crimes2.csv” into data frames and then merge them. Do not convert strings to factors. Treat empty cells as missing values. Output variables: o part1 (10 pt): data frame created from “mj_crimes1.csv” o part2 (10 pt): data frame created from “mj_crimes2.csv” o crime (20 pt): data frame created by vertically merging part1 and part2 (Hint: Assume that the column names in part1 are correct. Should produce a data frame with 1,203 rows and 12 columns) 2. (50 points) Read “neighborhoods.txt” into a data frame (do not convert strings to factors, treat empty cells as missing values). Then merge with crime. Output variables: o nbhd (13 pt): data frame created from “neighborhoods.txt”. o mj_df (47 pt): data frame created by horizontally merging crime and nbhd. Only include neighborhoods that have crime reports. (Hint: Make sure that the values in the shared columns match. There are 2 values in crime that should be corrected. Assume that the values in nbhd are correct. Should produce a data frame with 1,203 rows and 21 columns) 3. (30 points) Analyze marijuana industry-related crimes: Output variables: o industry_table: frequency table that counts the number of crime reports that were industry or non-industry related for each offense category. Display the offense categories as rows, industry vs. non-industry as columns. o industry_mod: logistic regression that models the likelihood of a crime being marijuana industry-related based on whether the crime was violent, plus the neighborhood’s population, age, vacant housing units, and home value o industry_r2: calculate the pseudo R2 for industry_mod using the following formula 1 − !”##!”## !”## represents the deviance of the full model !”## represents the deviance of the intercept-only model 4. (40 points) Analyze crime reports by neighborhood: Output variables and plots: o nbhd_summary: dplyr summary table that calculates the total crime reports (“TotalReports”), median age (“MedianAge”) and median poverty rate (“MedianPoverty”) for each neighborhood o age_cor: use nbhd_summary to calculate the correlation between a neighborhood’s total crime reports and median age o Create a scatterplot that visualizes the relationship between a neighborhood’s total crime reports and poverty rate. Add appropriate labels/titles. Set the points to be shaped like a triangle (point up), and filled with the color green (Hint: use the pch argument to change the marker type)


Author admin

More posts by admin