IST 387/687 Final Project – Spring 2020 Final Project Submission Deliverables: One 10-15 slide presentation that summarizes your analysis. The audience for this will be the executives level leadership of an Airline company. Please assume these executives do not know too much about statistics, so you probably should not quote terms like “R-squared” or “p-value” but rather describe your statistical results in plain language. One MS-Word file, containing a detailed report all your work. This report should include sections for all the phases of data science discussed in this course. A suggested template of the deliverables will be provided. The audience for this report is your Data Science professor’s. the audience you lab professor who understands R code and Data Science. Please make sure to include all assumptions made and any analysis completed, whether you found it significant or not. Rules of Engagement: This is an honor system assignment: You may consult with IST687 professors and Faculty Assistants (FA) , the textbook, and publications on the Internet at any time. You may not consult, collaborate, or seek assistance from any other human besides me. Your attribution statement, at the top of your R-code file, must reflect these constraints. You may not share your results or work in progress with any other human besides professors and Faculty Assistants (FA). Note that your data file is unique to you: The results that other students in IST387/687 obtain will be different from yours. Project updates from you will be due for 687 students, on the dates provided to ensure you are on track and there are not any outstanding questions. Project Goal: The goal of this term project is for you to use all of the skills you have developed in the IST387/687 labs/homework’s to make sense of a novel dataset, to perform some essential analyses on the dataset, and to explain/document what you have done. The dataset contains summaries of air travel within the U.S, one row per customer, per trip. Accessing Your Data File: The data will be available to you. The file contains about 32 columns/variables. Each row represents one customer’s airplane trip from an origin to a destination. Recommended Project Phases Data Pre-processing / Data Preparation Phase • Phase 1: Mitigate Missing Data. There are several columns in the dataset that may contain missing data. Write code that examines each column to see if it contains missing data. To mitigate missing data, use mean substitution for numeric variables. Use comments in your code to document how many missing data values you had to repair. • Phase 2: Summarize variables. For each numeric variable, create a histogram. Add a comment that describes the shape of the histogram as symmetric, positively skewed (long right tail), or negatively skewed (long left tail). For each factor variable (e.g., Gender), use the table() command to summarize how many observations are in each category. Exploratory Analysis Phase • Phase 3: Predictive Modeling . Many columns contain data relating to the characteristics of each customer’s trip. Using the modeling techniques, we learned in the class (Liner Modeling, Assoc Rules, SVM), develop 3-5 different predictive models that analyze the data. • Phase 4: Map Low Satisfaction Routes. Subset your data to create a smaller data set containing only the trips where customers reported the lowest levels of satisfaction. The latitude and longitude of each origin and destination is shown in the data set. Use ggplot to place route curves onto an outline map of the U.S. states. The geom_curve() geometry supports this kind of plotting. Business Recommendations Development Phase • Phase 5: Make Sense of Low Satisfaction Segments. The client wants to know why customers become dissatisfied with their air travel. Use insights from Phase 3 and Phase 4 to explain why certain trips have low satisfaction. Conduct any appropriate follow-up analyses to provide evidence for your ideas. Make sure to document any additional code with appropriate comments. • Phase 6: Develop Marketing Plan. Identify three interesting Market Segments. Define the demographic characteristics associated with each Market Segment. Finally recommend three ideas you for each segment that you believe would increase the NPS for the segment. Your presentation should provide the client (presumably the Executive/leaders of the airline) with an explanation of your results in language that is suitable for an Executive to understand. Your report should contain the data and visualizations that support insights and recommendations you are trying to communicate to the client.