辅导案例-T2 2020

  • June 23, 2020

Actuarial Data and Analysis, T2 2020 Assignment Part A Due time: Week 5 Wednesday, 1 July 2020, 11.55 am (sharp) 1 Skills developed This assignment provides you with an opportunity to get familiar with the given datasets before applying modeling techniques you are learning in the course lectures to a business task involving data. In addition, your skills in understanding/applying data manipulation and analysis methods (from the course materials and any additional reference material you consider) will be developed via this assignment. Communication of the results of your investigations and analysis is also an important skill developed. 2 Task You are a fresh actuarial graduate who has just joined the US Medicare Fraud Department as an analyst. Your team is in charge of analyzing Medicare data for detecting Medicare frauds made by the providers. Your manager has currently tasked you with providing a preliminary report on the attached datasets for you to be familiar with the data and the Medicare provider characteristics, and get ready for further analysis. Your main tasks involve data manipulation and analysis, as well as a report and a recommendation for further analysis (i.e. modeling). Note that all relevant steps in the data manipulation as well as data analysis results should be included in the report or appendix. 3 Additional information and mark allocation 3.1 Data manipulation and analysis (17 marks) For the data you have, you should manipulate the data to prepare for data analysis. This includes (but is not limited to): data exploration, data cleaning (if necessary), combining all the datasets and aggregating the data per provider (see the Resources section for documentation). The analysis of the data should provide a good sense of the datasets, insights on beneficiary, claim and provider characteristics as well as providing drive for further analysis. You may find interesting insights by analysing both the combined and aggregate datasets. This task does not consist of modeling but you should keep in mind that the question your team will ultimately be looking at is which providers are likely to have fraudulent claims. See the section on data for details. Mark allocation for the assignment can be found in the rubrics (on the course Moodle webpage). 1 3.2 Presentation Format (3 marks) Communication of quantitative results in a concise and easy-to-read manner is a skill that is vital in practice. As such, marks will be given for the presentation of your results. In order to maximize your marks for presentation you may wish to consider issues such as: table size/readability, figure axes/formatting, ease of reading, grammar/spelling, and report structure. You may also wish to consider the use of executive summaries and appendixes, where appropriate. Provide sufficient details to the reader so that they can judge what you are doing, using appendices for non-essential but useful results for the report as necessary. Note that sufficient detail must be provided (in either the report body and/or appendices) so that the reviewer can follow all the steps and derivations required in your work. Note that a maximum page limit of 2 pages (excluding tables and graphs) is applicable to the main body of the report.1 You should also consider the rubric for the presentation component (on the course Moodle webpage). There is no limit to the length of the appendix. 3.3 Software You may choose which software package to use (e.g. R, Python or other), however, nearly every function you will be required to use for this task is available in R. Note also that code enabling you to perform most of the computing can be found in the learning activities of the course and the Resources section. Note that any assumptions must be clearly identified and justified (if used). 4 Data The data is related to US Medicare claims and beneficiary details of 4436 providers from 2008 to 2009 and consists of 4 datasets: 1. Medicare_Provider.csv 2. Medicare_Inpatient.csv 3. Medicare_Outpatient.csv 4. Medicare_Beneficiary.csv Similar (but not identical) datasets are provided here. You may wish to check that webpage for further information about the context, data and problem.2 You may also wish to have a look at the following explanatory data analysis based on the Kaggle datasets to give you an idea of why and how to start the data analysis: Healthcare Fraud Detection With Python: The importance of exploratory data analysis (weblink here). This data analysis is just a brief example and is not based on your datasets. Different and more variables may be of interest for your analysis. 4.1 Medicare_Provider.csv (Provider Data) This dataset provides the provider ID and if yes or no they are fraudulent providers. Variable Description ProviderID: A unique ID assigned to each provider (character) Fraud: Is fraudulent? (categorical: “no”,“yes”) 1Please kindly note that this is a maximum – you should feel free to use less pages if it is sufficient! 2Optional readings for extra information and context on Medicare Fraud in US can be found here: link 1 and link 2. 2 4.2 Medicare_Inpatient.csv (Inpatient Data) This dataset provides insights about the claims filed for those patients who are admitted to hospital. It also provides additional details about the admission, discharge dates and diagnosis code. Variable Description BeneID: A unique ID assigned to each beneficiary (chr) ClaimID: A unique ID assigned to each claim (chr) ClaimStartDt: Start date of the claim (date) ClaimEndDt: End date of the claim (date) InscClaimAmtReimbursed: Claim amount reimbursed (num) AttendingPhysician: Attending physician (chr) OperatingPhysician: Operating physician (chr) OtherPhysician: Other physician (chr) AdmissionDt: Admission date (date) ClmAdmitDiagnosisCode: Claim admission diagnosis code (chr) DeductibleAmtPaid: Deductible amount paid (num) DischargeDt: Discharge date (date) DiagnosisGroupCode: Diagnosis group code (chr) ClmDiagnosisCode_1: Claim diagnosis code 1 (chr) ClmProcedureCode_1: Claim procedure code 1 (num) ProviderID: A unique ID assigned to each provider (chr) Important remark: Variables ClmAdmitDiagnosisCode, DiagnosisGroupCode, ClmDiagnosisCode_1 and ClmProcedureCode_1 correspond to specific international or national codifications.3 You don’t need to know or understand the details of the meaning of the codification. You can treat those variables as categorical and investigate only the most significant levels. • ClmAdmitDiagnosisCode represents the diagnosis code on the institutional encounter indicating the beneficiary’s initial diagnosis at admission. This diagnosis code may not be confirmed after the patient is evaluated; it may be different than the eventual diagnoses. • DiagnosisGroupCode represents the diagnostic group to which a hospital claim belongs. It is a unique identifier of a hospital case type that is based on similar clinical problems. • ClmDiagnosisCode_1 represents the diagnosis code in the 1st position identifying the condition(s) for which the beneficiary is receiving care. • ClmProcedureCode_1 indicates the principal procedure performed during the period covered by the institutional claim. 4.3 Medicare_Outpatient.csv (Outpatient Data) This dataset provides details about the claims filed for those patients who visited hospitals as outpatients. Variable Description BeneID: A unique ID assigned to each beneficiary (chr) ClaimID: A unique ID assigned to each claim (chr) ClaimStartDt: Start date of the claim (date) ClaimEndDt: End date of the claim (date) InscClaimAmtReimbursed: Claim amount reimbursed (num) AttendingPhysician: Attending physician (chr) 3Reference: Research Data Assistance Center, weblink here. 3 Variable Description OperatingPhysician: Operating physician (chr) OtherPhysician: Other physician (chr) ClmDiagnosisCode_1: Claim diagnosis code 1 (chr) ClmProcedureCode_1: Claim procedure code 1 (num) DeductibleAmtPaid: Deductible amount paid (num) ClmAdmitDiagnosisCode: Claim admission diagnosis code (chr) ProviderID: A unique ID assigned to each provider (chr) 4.4 Medicare_Beneficiary.csv (Beneficiary Details Data) This dataset contains beneficiary individual details (e.g. date of birth, date of death, health conditions, state, etc). Variable Description BeneID: A unique ID assigned to each beneficiary (chr) DOB: Date of birth (date) DOD: Date of death (date) Gender: Gender 1 or 2 (categorical) Race: Race 1 to 5 (categorical) RenalDiseaseIndicator: Renal disease indicator “0” (No) or “Y” (Yes) (chr) State: US state number (num) County: County (num) NoOfMonths_PartACov: Number of months Medicare Part A covered (num) NoOfMonths_PartBCov: Number of months Medicare Part B covered (num) ChronicCond_Alzheimer: Chronic condition Alzheimer 1 (Yes) or 2 (No) (num) ChronicCond_Heartfailure: Chronic condition Heart failure 1 (Yes) or 2 (No) (num) ChronicCond_KidneyDisease: Chronic condition Kidney Disease 1 (Yes) or 2 (No) (num) ChronicCond_Cancer: Chronic condition Cancer 1 (Yes) or 2 (No) (num) ChronicCond_ObstrPulmonary: Chronic condition Obstructive Pulmonary 1 (Yes) or 2 (No) (num) ChronicCond_Depression: Chronic condition Depression 1 (Yes) or 2 (No) (num) ChronicCond_Diabetes: Chronic condition Diabetes 1 (Yes) or 2 (No) (num) ChronicCond_IschemicHeart: Chronic condition Ischemic Heart 1 (Yes) or 2 (No) (num) ChronicCond_Osteoporasis: Chronic condition Osteoporasis 1 (Yes) or 2 (No) (num) ChronicCond_rheumatoidarthritis: Chronic condition rheumatoidarthritis 1 (Yes) or 2 (No) (num) ChronicCond_stroke: Chronic condition stroke 1 (Yes) or 2 (No) (num) IPAnnualReimbursementAmt: Inpatient annual reimbursement amount (num) IPAnnualDeductibleAmt: Inpatient annual deductible amount (num) OPAnnualReimbursementAmt: Oupatient annual reimbursement amount (num) OPAnnualDeductibleAmt: Outpatient annual deductible (num) 5 Resources • Data manipulation with R: dplyr (weblink here) • Merging with R (weblink here) • Tidy data in R (weblink here) • Explanatory Data Analysis with R (weblink here) • Data visualistion in R with ggplot2 for fancy plots (weblink here) • For any code related question google.com or stackoverflow.com are pretty helpful! 4 • As usual you can ask your questions on the course Ed forum. 6 Assignment submission procedure 6.1 Turnitin submission Your assignment report must be uploaded as a unique document and all parts must be in portrait format. As long as the due date is still future, you can resubmit your work; the previous version of your assignment will be replaced by the new version. Assignments must be submitted via the Turnitin submission box that is available on the course Moodle website. Turnitin reports on any similarities between your cohort’s assignments, and also with regard to other sources (such as the internet or all assignments submitted all around the world via Turnitin). More information is available at: [click]. Please read this page, as we will assume that you are familiar with its content. You can also find on the Moodle webpage the Turnitin Similarity Report Interpretation Guide (2019). Please also submit any programming code used in your analysis as a separate file in the dedicated “Code only” Moodle assignment box on the course webpage. These will be referred to by the marker only if needed, and in particular the report (with appendix) should be self-contained. You need to check your document once it is submitted (check it on-screen). We will not mark assignments that cannot be read on screen. Students are reminded of the risk that technical issues may delay or even prevent their submission (such as internet connection and/or computer breakdowns). Students should allow enough time (at least 24 hours is recommended) between their submission and the due time. The Turnitin module will not let you submit a late report. No paper copy will be either accepted or graded. 6.2 Late submission Please note that it is School policy that late submission of assignments will incur in a penalty. A penalty of 25% of the mark the student would otherwise have obtained, for each full (or part) day of lateness (e.g., 0 day 1 minute = 25% penalty, 2 days 21 hours = 75% penalty). Students who are late must submit their assignment to the LIC via e-mail. The LIC will then upload documents to the relevant submission boxes. The date and time of reception of the e-mail determines the submission time for the purposes of calculating the penalty. More information on Late submissions, extensions and special consideration is available in the Moodle course webpage section Additional resources from UNSW (at the bottom). 6.3 Plagiarism awareness Students are reminded that the work they submit must be their own. While we have no problem with students working together on the assignment problems, the material students submit for assessment must be their own. Students should make sure they understand what plagiarism is—cases of plagiarism have a very high prob- ability of being discovered. For issues of collective work, having different persons marking the assignment does not decrease this probability. More information on Academic integrity and plagiarism is available in the Moodle course webpage section Additional resources from UNSW (at the bottom). 5

LATEST POSTS
MOST POPULAR

ezAce多年来为广大留学生提供定制写作、留学文书定制、语法润色以及网课代修等服务,超过200位指导老师为您提供24小时不间断地服务。