辅导案例-MCEN90048

MCEN90048 Artificial Intelligence for Mechatronics Project 2: Recent Research and Applications of Artificial Intelligence Contents 1 Summary 1 1.1 Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Submission and Due Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Project Background 3 2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Method Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Industrial and Clinical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Project Protocols 4 4 Expected Deliverables 5 5 Marking Critera 6 Appendix A Topic Description 8 A.1 LR. Literature Review Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 LR.1. Reinforcement Learning in Mechatronics . . . . . . . . . . . . . . . . . . . . . . . . . . 8 LR.2. Continual and Incremental Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 LR.3. Recent Advances in Neuro Fuzzy Systems . . . . . . . . . . . . . . . . . . . . . . . . . 8 LR.4. Fundamentals and Recent Advances of Transfer Learning . . . . . . . . . . . . . . . . . 8 LR.5. Causality Inference in Deep Learning – Towards Artificial General Intelligence . . . . . 9 LR.6. Neural Architecture Search – Identifying Optimal Neural Networks . . . . . . . . . . . 9 LR.7. Anomaly Detection using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . 9 A.2 MR. Method Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 MR.1. Data Generation using Generative Adversarial Networks . . . . . . . . . . . . . . . . . 9 MR.2. Data Visualization with Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . 9 A.3 IC. Industrial and Clinical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 IC.1. & IC.2. COVID-19 Related Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 IC.3. Incremental Learning in Industrial Mechatronics . . . . . . . . . . . . . . . . . . . . . . 11 IC.4. Positive Unlabeled Learning in Drug Repositioning . . . . . . . . . . . . . . . . . . . . . 11 A.4 SS. Student-Suggested Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1 Summary 1.1 Topics In this project, we invite you to form a group of one to four students to participate in the recent research and real-world applications of artificial intelligence. We offer a selection of topics from which each group may choose one to form a project. The topics are from four categories: 1 1. LR: Literature Review – investigate the theoretical and mathematical foundations, recent advances, current challenges and future directions of a topic. 2. MR: Methods Research – investigate existing methods for specific tasks, benchmark the methods, and explore possible improvements. 3. IC: Industrial and Clinical Applications of AI – apply cutting-edge artificial intelligence methods on real-world problems. 4. SS: Suggested by Students – propose your own projects, set up your own goals and accomplish them. The proposed projects may belong to any of the above categories. In the table below, we propose a list of 13 topics into three categories (LR, MR, IC). Your group need to choose one topic out of these 13 to form a project. You may propose your own project. Please read Section 2 on the background of LR, MR, and IC projects. For more details on each specific project and related resources, please read Appendix A. Type Topic (listed not in particular order) Coordinator LR 1 Reinforcement Learning in Mechatronics Saman 2 Continuous and Incremental Learning Damith 3 Recent Advances in Neuro Fuzzy Systems Saman 4 Fundamentals and Recent Advances of Transfer Learning Saman 5 Causality Inference in Deep Learning – Towards Artificial General Intelligence Richard 6 Neural Architecture Search – Identifying the Optimal Neural Networks Richard 7 Anomaly Detection using Deep Neural Networks Saman MR 1 Data Generation using Generative Adversarial Networks Richard 2 Data Visualization with Unsupervised Learning Damith IC 1 COVID-19 Pneumonia Diagnosis Using Chest X-Rays Richard 2 Kaggle Competition: Forecasting Global Daily Confirmed COVID-19 Cases Richard 3 Incremental Learning in Industrial Mechatronics Damith 4 Positive Unlabeled Learning in Drug Repositioning Saman SS Projects Suggested by Students -* * For SS projects, you may invite any one of the three coordinators to be your group coordinator. 1.2 Submission and Due Dates Group formation This project is expected be completed by a group of one to four students. The group should be finalized by 6:00 pm on Wednesday 6th May 2020. For group registration, please go to Canvas – People – Groups to join any of the groups with your teammates and reply to us on Canvas – Discussion the name for your group. If you wish to propose your own project, the topic should be ready by this time. A short project proposal on the topic is appreciated for evaluation of the applicability of the proposed project. Expected deliverables The outcomes of this project depend on the topic you choose. They may include: 1. A project proposal, half-page to one page, applicable to SS projects only; due by 6:00 pm on Wednes- day 6th May 2020. 2. An initial report, one to two pages, applicable to all projects; due by 6:00 pm on Friday 15th May 2020; the initial report may be extended into the final report. 2 3. A short presentation submitted as a video record, ideally six to eight minutes, applicable to all projects; due at 6:00 pm on Sunday 24th May 2020. 4. Python code and result files, applicable to MR and IC projects; due by 6:00 pm on Friday 5th June 2020. 5. The submission result on the Kaggle website, only applicable to Project IC.2 Kaggle Competition: Forecasting Global Daily Confirmed COVID-19 Cases; due at 6:00 pm on Friday 5th June 2020. 6. A final report, four to five pages, applicable to all projects; due by 6:00 pm on Friday 5th June 2020. This may be extended from the initial report. The Outcomes 2 – 6 are submitted via Canvas – Assessment. The Outcome 1 project proposal (only for SS projects) should be sent via email to both Richard and Damith. Detailed explanation for each topic is provided in Appendx A. 2 Project Background This assignment aims to provide the students with the experience of research and real-world applications of artificial intelligence related to Mechatronics and adjacent fields. In this section, we introduce the background for each type of projects in this assignment. 2.1 Literature Review A literature review is a critical analysis of published literature that present the current knowledge including substantive findings as well as theoretical and methodological contributions to a particular topic. It is an assessment of the literature and provides a summary, classification, comparison and evaluation, and often the first step towards solving any complex real-world problems. Although literature review is offered as project category in this assignment, all other projects include a portion of reviewing existing methods. For the groups choosing a topic from the LR category, each student is expected to read in detail at least 5 research papers from the recent decade. You are required to understand the overarching mathematical principles, problem formulation and categorize literature according to the high-level ideas. This should also be complemented with your logical assessments of the state-of-the-art methods, the status-quo of the domain, and potential improvements or gap areas. For MR and IC topics, the students will also read recent research papers, identify the cutting-edge methods, and apply them in practice. The goal of the literature review in this case is to justify your selection of methods or analysis procedures. For instance, you may find rationale in literature to use certain pre-processing techniques, model architectures and training methods. To support the rigor of these selections, yo
u are free to cite existing literature and findings. Conversely, any such choice in your approaches should be well-supported in literature where applicable. 2.2 Method Research The second step towards solving any complex real-world problems is to select the set of tools for the prob- lems or similar ones. In the case of artificial intelligence, the tools are methods, algorithms or models. Any selection should be well-supported by literature and often your own experiments. The performance of mod- els is typically first evaluated on a benchmark set of datasets using standard metrics. For example, image classification is one typical type of problems that enjoys a wide variety of real-world applications including photo and video categorization, visual search engine, product recommendation, self-driving automotive, and disease diagnosis. Image classifiers are often benchmarked on datasets such as ImageNet, SVHN, Object- Net before applications on any specific datasets in considered problems. The benchmark results motivate researchers to further develop their methods and also provide certain insurance for practitioners especially on problems with insufficient labels. For MR topics, we encourage students to implement or evaluate recently proposed methods on benchmark datasets. For model comparison, the same experiment setup should be applied to all methods for fairness. For example, to compare model efficiency, all models should be executed on the same or similar devices. To 3 investigate the effects of components of a method, e.g., the normalization method in each layer, an ablation study is often required that removes the considered component. At last, a systematic study of different hyper-parameter settings needs to be conducted for increased rigor in your findings. For the MR and IC projects in this assignment, you may use existing libraries (e.g., TensorFlow, Keras) or online resources (e.g., GitHub) to implement algorithms. Sometimes, you may need to implement the algorithm or modify the existing implementation by yourselves. 2.3 Industrial and Clinical Applications At this stage, we apply the selected methods on real-world industrial and clinical problems. The big difference between real-world datasets and the benchmark datasets are that benchmark datasets are often deliberately curated to evaluate certain merits of methods while in practice, there is a greater level of uncertainties in the data, e.g., the features may be noisy (not certainly Gaussian noise), the labels may be arbitrarily wrong (not according to any distribution), there may be outliers that are often associated with wrong measurements. As a result, models robust to noise and data preprocessing methods such as outlier detection may be of significance in practice. In addition, for real datasets, domain knowledge certainly helps building a better model while on benchmark datasets, we sometimes ignore the domain knowledge in order to achieve a general model. For IC projects, students are expected to apply existing methods and come up with creative solutions to analyzing a real-world dataset. From benchmarks in literature, you may not find a consensus on what is the ‘best’ model for your problem. Therefore, we encourage you to find several candidate models, apply them and modify them with any domain knowledge. As mentioned previously, students are encouraged to take advantages of any online resources. 3 Project Protocols Students should follow the protocols below for this group assignment: • This project may be finished by a group of students where each student should contribute roughly equally to the expected outcomes (report, presentation and code if applicable). Each group should have one to at most four students. • For LR projects, each student in the group is expected to read in detail at least five research papers published in the recent decade (2010-2020). Additionally, you may cite other resources (papers, blogs, GitHub repositories) of any time where it is merited to explain founding concepts. • For MR and IC projects, we do encourage each group to explore several models and tune the hyper- parameters to get as good performance as possible. Note that this would take a significant amount of time, so do not wait until the last week to do the experiments. However, due to the limited Spartan resources, we suggest each group do not submit more than 4 jobs (or request more than 4 GPUs) at the same time. • For LR projects, students may follow any review papers to learn academic writing and organization of materials. For MR and IC projects, students may use any pre-trained models and existing code on GitHub, Canvas, or other platform. However, copying published work (texts, figures, tables, etc) into your reports or sharing any part of the outcomes among different groups of this subject is considered plagiarism and is strictly prohibited. In this assignment, each group of students are also welcomed to propose their own project (labelled under SS) and invite any one of the three coordinates to be your group coordinate. The SS projects may roughly belong to any of three topic categories, namely, literature review, method research or applications. The SS projects should be challenging, and ideally relevant to deep learning or machine learning techniques introduced in this subject. For the SS projects, the group should follow the same protocols above and finish the expected outcomes listed in Section 4 Expected Deliverables. 4 4 Expected Deliverables Each group should have one to four students and finish the following tasks for the chosen project: 1. An initial report, which may be extended into the final report; 2. A short presentation submitted as a video record; 3. A final report. For students working on MR and IC projects, Python code and result files (e.g., saved models if applicable) also need to be submitted. For students working on Project IC.2. Kaggle Competition: Forecasting Global Daily Confirmed COVID-19 Cases, the submission results on the Kaggle websites need to be submitted as a screenshot. For students proposing their own projects, a project proposal need to be submitted. For all written submissions including project proposal, initial report and final report, we only accept PDF file format. You may use LaTeX (OverLeaf) or any office software (Microsoft Word, Google Doc, etc.) to generate a PDF. We recommend Times New Roman or Cambria font, with 14pt for section titles, 10-12pt size single spacing for main text, 8-9pt for any captions or illustrations of figures and tables. For references, the APA Citation Style is recommended. The page limits for written submissions do not include references (you may provide as many reference as you want) and optional appendices (if applicable, you may provide in appendices the mathematical derivations, theoretical proof, method details, supplementary experiments, figures and tables that cannot fit into the main text, but please note that the appendix will not be scored). In the following, we describe each deliverable in detail. Project proposal Students who wish to propose their own projects should submit a project proposal by 6:00 pm on Wednesday 6th May 2020. The proposal should ideally be from half-page to one-page. It should explain the impact, scope and measurable outcomes of the SS projects (see Appendix A.4 for definitions) and advise the name of the coordinator your group would like to nominate. Initial report In this task, the group is required to submit an initial report including the following content: • Description and understanding of the project topic, methods and/or datasets; • Review of existing methods to solve this or similar problems; • Proposed assumption, hypothesis and/or solution towards solving this problem; • Progress and the plan (including task allocation for each student in the group). The initial report has a maximum two-page limit which includes main text, tables and figures but excludes references and appendix. The initial report is due at 6:00 pm on Friday 15th May 2020. In your initial report as well as the final report, a highly logical flow should be presented in academic
English. Ideally, your reports should assume that the reader has no extensive prior knowledge of your problem and your approach. For claims you are not observing in the results presented in your report, please cite references. Striking a good balance between the rigor and clarity will be highly rewarded in our grading. Short presentation In this task, each group is required to give a presentation on the current progress, results and conclusions. The presentation should cover the following topics: • Description and understanding of the project topic, methods and/or datasets; • Ideas, hypotheses, current progress, results and conclusions; • If applicable, encountered problems and possible solutions; • Future work to be done before final submission. Each group should nominate at least one member to do the presentation and submit it before 6:00 pm on Sunday 24th May 2020. The presenter(s) should have full consent from the rest member(s) of the group. The presentation should ideally last about 6 to 8 minutes. The presentation should be submitted as a video in any standard format, e.g., AVI, MP4, MPEG, WAV, etc. Please note that the quality or resolution 5 of video and audio should not be a concern here, as long as the presented content is visible on screen and the audio can be easily understood. Thus, there is no need to use a camcorder to record the video. Instead, we suggest using the camera and speaker of a laptop or phone and software such as Zoom. Please note that there is no need to show the presenter(s) in the video if the slides are focused on. If the video turns out to be too large to submit to Canvas, please submit a link to the video shared via any file-hosting platform like OneDrive, Google Drive, Dropbox, etc., and be sure to give the coordinators access to it. Code files (Where applicable) Please compress all your code and relevant resultant files into a zip file and submit it to Canvas along with the final report. Although you may use any language/platform, we highly recommend Python (3.6 or newer) and Tensorflow (2.1 or newer). Submission result on Kaggle For students working on Project IC.2 Kaggle Competition: Forecasting Global Daily Confirmed COVID-19 Cases, please upload a screen-shot of your final Kaggle submission results along with your final report. Please note that, to get these submission results, you have to first register an account on Kaggle if you have not done so previously. Final report The final report should be a self-sustained explanation of your project in at least four pages. You may extend your initial report to the final report. Please submit the final report before 6:00 pm on Friday 5th June 2020. For all projects (LR, MR, IC, SS), please organize your writing in a logical structure. For LR projects, the following section structure is recommended: 1. Section 1 Introduction: problem definition and significance, and scope of review. 2. Section 3: Related Work : In this section, please summarize literature and their rationale in a organized structure. There should be a logical flow between methods and paragraphs. You should discuss the strengths and weaknesses of the methods. You may draw contrasts as to which methods are best suited for which specific scenarios, etc. 3. Section 4: Discussion: In this section, please summarize your findings and give an assessment on the status quo. Ideally, you may find some gap areas where more research needs to be done. We encourage you to construct a report in the following structure for MR or IC projects: 1. Section 1: Introduction: Problem definition and significance. 2. Section 2: Related Work: Review of methods in literature which should ideally support your choice of approaches. 3. Section 3: Methods: Describe your approach in detail. You may provide algorithm blocks or flowcharts if needed. 4. Section 4: Results: Concisely describe your results and observations, illustrated with figures and tables. You may provide results for benchmark methods for comparison purposes. 5. Section 5: Discussion: Summarize your project work and findings . You may opine on shortcomings, challenges and how you wish to improve further as well. 5 Marking Critera This group project takes up 25% of the final marks of the subject. The marks are divided among the tasks as follows: 1. Initial report – 5% • The report should be readable and does not have obvious technical errors – 1%. • The report shows a good understanding of the selected topic – 1%. 6 • The review of methods is comprehensive and reasonable – 2% for LR projects, 1% for MR and IC projects. • The proposed hypothesis is reasonable, and the solution is achievable – 1% for MR and IC projects; this criterion does not apply to LR projects. • The progress and plan are satisfactory, and the task allocation is reasonable – 1%. 2. Presentation – 5% • The presentation is well-structured, and the idea is properly communicated – 2% • The content (introduction, methods, progress, results, and conclusions if applicable) are presented nicely without obvious technical errors – 3% Please note that the presentation will be scored by three examiners (the lecturer and two tutors) independently and the average score will be used as the final score for presentation. 3. Final report – 15% for LR projects, 5% for MR and IC projects. • The final report extends the initial report accordingly – 2% for LR projects, 1% for MR and IC projects. • The idea, methods and results are clearly and logically presented in Academic English – 7% for LR projects, 1% for MR and IC projects. • The analysis is reasonable and do not have obvious technical errors – 4% for LR projects, 2% for MR and IC projects. • Based on the whole report, the observations and conclusions are correct – 2% for LR projects, 1% for MR and IC projects. 4. Code files and submission results on Kaggle – 10% for MR and IC projects; this criterion does not apply to LR projects. • The code files and results are well structured, and the code is smartly commented – 1% • The save model can be loaded easily and reproduce the predictions faithfully – 2% • The model predictions are accurate – 7% Please feel free to contact Richard if you need any clarification on the tasks, requirements, and marking criteria of Project 2. 7 Appendix A Topic Description For details on the provided topics, please read the descriptions relevant to projects you are interested in. A.1 LR. Literature Review Projects In the following, we introduce the topics briefly. Please note that these are general topics. In your outcomes, a good strategy is to focus on a relatively narrower aspect of the general topic. For example, In LR.1. below, instead of reviewing papers in reinforcement learning and Mechatronics in general, you may focus on specific type of methods (e.g., model-free reinforcement learning methods), or applications (e.g., robotics control), or both (e.g., imitation learning in robotics control). Each student may use any online resources (tutorials, blogs, Google Scholars, etc.) to find at least five research articles related to each topic for a thorough reading. Students may also read blogs and literature review papers to learn how to write reviews. However, plagiarism is strictly forbidden. LR.1. Reinforcement Learning in Mechatronics Reinforcement learning (RL) is a machine learning area investigating how software agents ought to take actions in an environment in order to maximize the cumulative reward. RL deals with problems where the models connecting actions and rewards are hard to define and differentiate. These problems are frequently come across in mechatronic applications including robotics and automation. In this project, students are expected to investigate the recent advances in the interdisciplinary area of RL and Mechatronics. Note that Mechatronics is a broad area involving robotics, electronics, computer, telecommunications, systems, control, and product engineering. You may choose any sub-area of Mechatronics or discuss Mechatronics in general. LR.2. Continual and Incremental Learning The real intelligence learns knowledge throughout
a lifetime. The continual learning capability is crucial in stepping towards artificial general intelligence. Additionally, in real-world, data and tasks are often presented incrementally. While models can be trained on an initial dataset or task, they can also be improved up-on with new incoming data or task. However, conventional models tend to encounter an issue called catastrophic forgetting when presented with new data. Contextualizing new data with already learnt data is a significant problem applicable to both supervised and unsupervised learning. In this project, students are expected to investigate the-state-of-the-art incremental and continual learning algorithms which together step towards a lifetime learning capability. LR.3. Recent Advances in Neuro Fuzzy Systems Neuro-fuzzy refers to combinations of artificial neural networks and fuzzy logic. The main strength of neuro- fuzzy systems is that they are universal approximators with the ability to solicit interpretable IF-THEN rules. Combining the learning power of neural networks, with knowledge representation capabilities of fuzzy logic makes neuro fuzzy systems an attractive candidate for AI researchers. In this application, students are expected to investigate the fundamental ideas of neuro-fuzzy systems, focusing on the recent advances of the domain in this project. LR.4. Fundamentals and Recent Advances of Transfer Learning Humans often learn new things drawing from already known things. If you have been driving a sedan for several years, you may learn to drive a truck or a bus by relating concepts you have already learned such as acceleration, turning and slowing down. In transfer Learning, the model uses some of the already inferred knowledge in new learning tasks rather than learning everything from scratch. This allows for highly use of data. In this project, students are expected to investigate some recent advances along with the mathematical foundations of transfer learning. 8 LR.5. Causality Inference in Deep Learning – Towards Artificial General Intelligence Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The neural networks we have learnt in lectures so far deal with association inference, i.e., connecting data to labels, images to classes. However, our humans can easily do reasoning, figuring out and the causality and effects. As a result, causal inference research is considered to be the next trend in deep learning, a step towards artificial general intelligence. In this project, students are expected to investigate the strategies and recent methods to infer causality from data. LR.6. Neural Architecture Search – Identifying Optimal Neural Networks Finding the optimal structure for a neural network suitable for a given task is a labor-intensive process. Human experts spend a lot of time on designing a good solution structure, i.e., selecting the number of layers, number of neurons, activation functions, connections between neurons, etc., for which we often rely on prior knowledge and empirical evidence. It is thus reasonable to explore how we can automate this process in a more scientific manner. In this project, we investigate the recent advances, their philosophical underpinnings and the fundamental challenges. LR.7. Anomaly Detection using Deep Neural Networks Anomalies, or outliers, are defined as events that deviate from the standard. For example, in DNA sequence analysis, anomalies could be caused by contaminated samples. Anomalies happen rarely, and do not follow the rest of the data patterns, making it hard and crucial to detect them. In traditional machine learning, researchers have created algorithms such as Isolation Forests, One-class SVMs, Elliptic Envelopes, and Local Outlier Factor to help detect anomalies. Can we utilize the powerful representation learning capabilities of deep learning from this task? In this subject, students are expected to investigate the challenges in anomaly detection, recent deep learning solution and their hypothesis, foundations and gaps. A.2 MR. Method Research MR.1. Data Generation using Generative Adversarial Networks Have you ever wondered how life-vivid human faces are generated by Generative Adversarial Networks (GANs)? If not, go check this cool website: This person does not exist (clickable link). In this project, you are required to benchmark a high-performant GAN on CelebA dataset (downloaded here) (or CelebA-HQ which is a high-resolution version of CelebA) using metrics such as Inception Score, FID or others. During the experiments, we encourage you to assess the following aspects of GANs: • Running time. Does the image resolution significantly affect runing time? • Model stability. Is GAN training stable? • Sensitivity to Hyper-parameters. Do hyper-parameters significantly affect generated image quality? • Mode collapses. Is there any portion of the data never appears in the generated samples? Is there any way to measure model collapses. As a starting point, the following tutorials may be helpful: • GAN lab: playing GANs in your browser. • Blog: A Beginner’s Guide to GANs. • Paper with code: a search engine for papers that share code on GitHub. MR.2. Data Visualization with Unsupervised Learning Visualization of high-dimensional data is a preliminary step used in many data analyses. We can get a sense of the ‘structure’ of the high-dimensional data by visualizing them. There are significant recent advances in this field of study with highly efficient algorithms being presented. In this study, you are expected to 9 benchmark data visualization methods on a selective set of datasets. The goal is to familiarize yourself with the process of gaining information from unsupervised preliminary analysis that may help you with the downstream analysis. During the experiments, you may compare the following properties of the data visualization methods: • Running time. • Quality of visualizations. Are the clusters well separated? Does the method preserve class hierarchies? You should compare methods use both quantitatively and qualitatively. • If the data are provided incrementally to the data, can the methods adapt to new data increments? You are also encouraged to pay attention to the following aspects: • Size of the dataset, including the number of data instances and dimensionality, and how they affect the method running time. • Uniformity of the data. Are the data instances from a single cluster with certain properties spread around, or are there multiple clusters? • Noise and outliers in data. Would adding noise and outliers sabotage the visualization? Which method is the most robust? We do encourage you to be creative and explore other aspects of methods and data not mentioned above. The following tutorials may help you get started: (link clickable) • Introduction to Dimensionality Reduction on Kaggle. • Blog: Data visualization and dimensionality reduction using t-SNE. • Nature paper: The art of using t-SNE. A.3 IC. Industrial and Clinical Applications IC.1. & IC.2. COVID-19 Related Projects When this document is being written, the COVID-19 has arguably infected more than 3 million people in the world (clickable source: ECDC), causing a huge loss to public health, well-being and economic. In this assignment, we provide two projects related to COVID-19 to give you an idea how machine learning techniques can contribute to this fight against Corona-virus. These two projects are: • IC.1. COVID-19 Pneumonia diagnosis using chest X-Rays. • IC.2. Kaggle Competition: Forecasting Global Daily Confirmed COVID-19 Cases As a group, you may choose either one to form a project. For Project IC.1., you will build sample- efficient image classifier to distinguish chest X-rays of individuals with respiratory illness testing positive for COVID-19 from other X-rays. The classifier model is ideally interpretable to promote discovery of patterns in such X-rays. You may find more information and links to three datasets at this blog (clickable link). The three continually growing datasets are provided at:
(clickable links) • COVID-19 image data collection (GitHub link). • Figure 1 COVID-19 Chest X-ray Dataset Initiative (GitHub link). • Kaggle competition: RSNA Pneumonia detection challenge. For Project IC.2., you are expected to participate in a global epidemiological study and competition initiated by the White House Office of Science and Technology Policy (OSTP), US, and build a regression model to predict the daily spread in regions around the world. The links to view the background, datasets and other people’s solutions are listed below: (clickable links) • COVID19 Global Forecasting (Week 1). • COVID19 Global Forecasting (Week 2). 10 • COVID19 Global Forecasting (Week 3). • COVID19 Global Forecasting (Week 4). With the time moving on, the follow-up competitions will be available. Please participate in the latest competition of this series and you may use all data up-to-date. IC.3. Incremental Learning in Industrial Mechatronics Industrial components undergo degradation over time. Machine learning provides a way to automate the monitoring of such components to keep industrial plants running. In this Kaggle competition (clickable link), the objective is to identify such degrading materials in an unsupervised manner and you are required to detect any degradation as well as identify whether component replacement has happened. However, in this assignment, we take one step further. Although the competition provides you with a complete year’s worth of data, can we do this in real time? This means that as we monitor the equipment, we may not begin with a large set of data, but incrementally gather data as the processes commence. Your challenge is to build on the work provided in the kernels of this competition, and related publication, to see if and how you can learn to predict anomalies/degradation of components incrementally. Please check the following resources for more information: (clickable links) • Kaggle competition: one-year-industrial-component-degradation and the kernel page. • Previous Kaggle competition related to this project. • Research paper on anomaly detection using Self-Organizing Maps. • Damith’s recent manuscript on data visualization in incremental learning scenarios. IC.4. Positive Unlabeled Learning in Drug Repositioning Drug-Drug interactions (DDIs) can occur when two drugs are administered to a patient simultaneously. However, verifying DDIs experimentally is a time-consuming and costly procedure. While computational methods are great candidates for identifying DDIs from the drugs’ chemical and therapeutic properties, the challenges lie in that the known interactions are rare (low number of positive labels) and most drug pairs are not tested for interactions. In such a case, positive unlabeled learning is considered, where we have a binary classification problem with only a small number of labeled data points. Note that for such data, supervised learning models may not be appropriate as the model will simply overfit the small number of labeled samples. In this project, you are expected to investigate appropriate machine learning methods to predict possible DDIs by looking at the chemical and therapeutic properties of known drugs, and a small number of known interactions. Please follow the related resources as the first step: (clickable links) • Research paper on Positive Unlabeled Learning based DDI identification • Drugbank Database • Download links for preprocessed Drug Data A.4 SS. Student-Suggested Projects In addition to proposing the above projects, we also encourage students to be creative and propose their own projects (SS) that may roughly be categorized as any of LR, MR and IC. However, SS projects need to be approved by the subject coordinators before you commence the projects. A good project should have the following properties: • Impact: The project should have considerable relevance to machine learning methods addressing a significant methodological problem, solving a problem which impacts a large community, etc. • Scope: The project should be planned and able to be concluded before the final report submission (Friday 5 June 2020). 11 • Measurable Outcomes: The project should be self-sustained, i.e., it should provide its own resource such as publications, blogs, code repositories, datasets as well as outcomes such as software libraries or conclusive reports. 12

辅导案例-MCEN90048

Related

Previous Post辅导案例-MATH 161B

Next Post辅导案例-MSCI 224

Author admin