Skip to main content


By May 15, 2020No Comments

MTH783P Time Series Analysis for Business Spring 2020 Assessed Coursework Silvia Liverani The dataset for this assessment is a modified version of the Air Quality Data Set from the UCI Machine Learning Repository. This dataset contains the responses of a gas multi- sensor device deployed on the field in an Italian city. The dataset contains 390 instances of daily responses from an array of several metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located on the field in a significantly polluted area, at road level, within an Italian city. Data were recorded from March 2004 to April 2005 (one year). Ground Truth daily averaged concentrations for Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) are provided, together with information on weather conditions. Missing values are tagged with the value -200. The description of the variables is available in Table 1. Table 1: Description of the variables Date Date (dd/mm/yyyy) NOx True hourly averaged NOx concentration in ppb NO2 True hourly averaged NO2 concentration in microg/m3 Temp Temperature in °C RH Relative Humidity (% ) AH Absolute Humidity The dataset is available on QMplus as AirQualityCoursework.csv. Use R to analyse the dataset and address the following tasks. 1. (5 points) Split the data into two datasets: a training dataset and a test dataset. The aim will be to use the training dataset to forecast the value of NOx concentration daily in January 2005. Therefore, the training dataset should incude the first 296 observations (until the last observation of 2004). The test dataset should include the 31 daily observations for January 2005. 2. (25 points) Explore the training dataset: plot and produce summary statistics to identify the key characteristics of the data and produce a report of your main findings. The topics that you might choose to discuss include: possible issues with the data collection, identification of possible outliers or mistakes in the data, role of missing data (if any), distribution of the variables provided, relationships between variables. 3. Fit a statistical model to the training data and use it to forecast the NOx concentra- tion every day in January 2005. (a) (20 points) How did you decide which model to fit? Include details of other models that you tried, if any. 1 (b) (10 points) What are the underlying assumptions of the model that you have chosen? Carry out a residual analysis to ensure that the assumptions are satis- fied. (c) (10 points) Forecast the NOx concentration every day in January 2005 and dis- cuss the results. (d) (10 points) Discuss any weaknesses of this analysis. 4. (10 points) All tables and plots that you include in your report should be repro- ducible. Therefore, include in your submission on QMplus a text file with the R commands that can be used to reproduce your results, including tables and plots. This text file should include all and only lines of code used to produce results pre- sented in the report and it should be written in a clear and readable way. 5. (10 points) Marks will be given for the overall presentation of the coursework, the quality of figures and writing. All modelling and forecasting choices and assumptions must be justified. Requirements for the coursework submission: • The submission deadline is 15:00 on Tuesday 28th April. • The submission should include a document in .pdf format containing the answers to questions 2 and 3 (with a 3-page limit, including figures and discussions) and a text file (with extension .txt) containing the R-code used for the results presented in the report. Minimum fontsize is 12. • While discussing the coursework with your classmates is encouraged, the sub- mission must be your own independent work. Every submission will be checked for plagiarism using an automated system. Please refer to the QMUL Academic Regulations for more information about the definition of plagiarism and the re- lated penalties: chapterid=103814 • The policy for late submissions of the School of Mathematical Sciences will be used. You can read the policy here: id=1007932&chapterid=103810 2


Author admin

More posts by admin