Skip to main content
留学咨询

辅导案例-PSTAT130

By May 15, 2020No Comments

PSTAT130: SAS Base Program Project: Actuarial Science with SAS Claire Mouminoux Winter 2020 Instructions: Please note that you are responsible for the work sharing. Each group will submit two files on GauchoSpace • Your SAS code in .txt format – Please follow the libname name, vari- able names and database names instruction to facilitate correction. • The report produced using the OUTPUT DELIVERY SYSTEM (ODS) as .pdf file The code must be appropriately commented to refer to the instruction num- ber, and the output customized with title so that we understand which problem you are treating. Do not forget to add your names at the top of the code. 1 Context You have just been hired by a Californian Insurance compagny CA SUN INSURANCE as a statistical analyst and SAS programer in the technical department of motor insur- ance. The company wants to renew their motor insurance price strategy for the 2020 new business 1. To do so, you will have to analyze their past data to propose an ade- quate price regarding: – expected costs of motor claims for the different profile of customers; – the volume of new business (number of new contracts); – overall expected profit generated by the 2020 new business. In addition, you will colaborate with the marketing department which will give you some information about customers’ behavior and their probability to underwrite a contract regarding their quotes (i.e. the price that you will determine). The company has high expectation from your work and you will have to generate an easely understandable SAS report to explain your price strategy. Also, you will have to deal with some constraints imposed by your managers and demonstrate to them that you have followed their recommendations. 1The 2020 new business corresponds to all new insureds who will underwrite a contract for the first time at CA SUN INSURANCE throughout 2020 1 2 All you need to know 2.1 Actuarial Science: the art of estimating future costs 2.1.1 Pure Premium: Frequency × Severity To estimate futur claim costs for a cohort i 2, actuaries and statistical analysts use the past data collected over years to determine two important components: – The frequency: corresponds to the number of claims of a cohort over the number of individuals within this cohort: Freqi , it could be interpreted as the probability to have a claim. – The severity: corresponds to the average cost of claims within the cohort i: Costi . Example – Step 1: Name Age Gender Claims Claim Cost Pierre 18 H Yes 500 Paul 50 H Yes 800 Luke 18 H No Claire 50 F No Anna 50 F Yes 200 Maria 18 F Yes 400 Jude 18 F Yes 500 Peter 50 H No Sylvia 18 F Np Henri 18 H Yes 300 Oliver 18 H Yes 900 Suppose we define a cohort according to the following characteristics: Age and Gen- der. In this example, we have 4 different cohorts corresponding to 4 profiles. We need to determine: – the number of individuals within each cohort (number of individuals having the same profile): NB Individuals; – the number of claims recorded within each cohort: NB Claims; – the sum of claim costs within each cohort: Sum of Claim Costs. 2A cohort is a group of individuals having the same characteristics such as the same age and the same car for instance. It corresponds to the profile of an individual. 2 Cohort i Age Gender NB Individuals NB Claims Sum of Claim Cost 1 18 H 4 3 1700 2 50 H 2 1 800 3 18 F 3 2 900 4 50 F 2 1 200 We are now able to compute the frequency Freqi and the severity Costi of claims for each cohort i = {1,2,3,4}. These two elements allow us to compute the P ure P remiumi (P Pi) for each individual given his/her profile (i.e. given the cohort i in which he/she belongs to). The pure premium is defined by: P Pi = Freqi ×Costi . It corresponds to the expected average claim costs of an individual belonging to cohort i. Indeed, it takes into account both: the probability of an individual with a given profile to have a claim and the expected costs of the claim if any. Cohort i Age Gender Freqi Costi P Pi 1 18 H 0.75 1700 425 2 50 H 0.5 800 400 3 18 F 0.67 900 300 4 50 F 0.5 200 100 2.1.2 The Loaded Premium: Pure Premium × (1+Loading Factor) The costs of claims is not the only cost to take into account for an insurance com- pany. Besides the price of repairing a car after an accident, for instance, the insurance company has to remunerate specialists (such as actuaries, accident experts, accounting officers,…): theses costs are called the handling costs. In general, we consider these costs as proportional to the expected average costs of claims of an insured (i.e. the pure premium). In other words, policyholders who have more claims or/and more costly claims generate higher handling costs (i.e. need of an expert, take more processing time, …). The Loaded P remiumi (LPi) for each individual belonging to a cohort i is equal to: LPi = P Pi × (1 +λ) where λ > 0 is the Loading Factor. The loading factor is usually given by the account- ing department of the insurance company. Example – Step 2: Let’s return to our example. We have already determined the P Pi for each profile/cohort i. Suppose that the accounting department of the insurance company gives you the fol- lowing information: 3 For a claim cost of $500 we also have additional handling costs of $25. We thus estimate the loading factor at 5% = 25/500 We are now able to estimate the future costs (including claim costs and handling costs) for each profile. Cohort i Age Gender P Pi (1 +λ) LPi 1 18 H 425 1.05 446 2 50 H 400 1.05 420 3 18 F 300 1.05 315 4 50 F 100 1.05 105 2.2 The Insurance Price Strategy 2.2.1 Insurance Price: Loaded Premium × (1+Margin Rate) The Insurance P ricei (Pi) is the final price proposed to a new prospect 3. To make prof- its, the insurance company cannot define: Insurance Price = Loaded Premium. They need to generate a margin on each contract and will define a price strategy by defining a Margin Rate. Therefore, the Insurance P ricei (Pi) is : Pi = LPi × (1 + β). where β > 0 is the Margin Rate. Example – Step 3: Let’s return to our example! Suppose that the managers are hesitating between two price strategies: – Strat A: βA = 0.01 – Strat B: βB = 0.05 We first compute the Insurance Price for each profile i and each strategy A or B: Cohort i Age Gender LPi Pi,A = LPi × (1 + βA) Pi,B = LPi × (1 + βB) 1 18 H 446 451 469 2 50 H 420 424 441 3 18 F 315 318 331 4 50 F 105 106 110 3A prospect is a customer who asks for a quote, but who is not already a client/insured of the com- pany. He/She will decide to become a client if he/she agrees with the Insurance Price. 4 2.2.2 Determining the Best Strategy At this stage, we are not able to determine which of A or B price strategy is the best. A highest price means a largest margin BUT a highest price also means a lowest proba- bility that a prospect (the customer who receives the quote) accepts it and becomes a new client! We need more information about customers’ behaviors and that’s a good news since an entire department is studying that! Indeed, the quantitative marketing departe- ment of an insurance company studies the behavior of prospects with respect to an insurance price quote. In other words, they compute the probability of a prospect to underwrite the insurance contract given its price. To do so, they generally use a logistic regression where : P rob(Accept|Pj) = 11 + exp(f (Pj)) Here P rob(Accept|Pj) is the probability that a prospect j receiving an insurance price Pj accepts it and becomes a new clients: a New Business contract. To determine the best strategy you will need two elements from the marketing depart- ment: – The function f (Pj) that they have estimated; – A database of prospects to compute the effect of your two insurance price strategies. Example – Step 4: The marketing department gives you the following information. Please find below our estimated function and the database of prospect clients for 2020 P rob(Accept|Pj) = 11 + exp(f (Pj)) with f (Pj) = −0.45× LPj Pj + 0.001× Pj we consider that a prospect becomes a new business client if P rob(Accept|Pj) > 0.5. 5 Prospect_ID Age Gender 1 18 F 3 18 F 5 18 F 6 18 F
7 18 F 12 50 F 14 50 F 2 18 H 4 18 H 8 18 H 9 18 H 10 50 H 11 50 H 13 50 H 15 50 H Now we have to determine: – what price the prospect will receive depending on each strategy A or B; – what is the probability that a prospect accepts depending on each strategy A or B and the information given by the marketing department ; – who accepts depending on each strategy A or B and the rule: acceptance if P rob(Accept|Pj) > 0.5. Prospect_ID Age Gender LP P_A P_B P(Accept |P_A) P(Accept |P_B) Accept | P_A Accept | P_B 1 18 F 315 318 331 0.532 0.524 Yes Yes 3 18 F 315 318 331 0.532 0.524 Yes Yes 5 18 F 315 318 331 0.532 0.524 Yes Yes 6 18 F 315 318 331 0.532 0.524 Yes Yes 7 18 F 315 318 331 0.532 0.524 Yes Yes 12 50 F 105 106 110 0.584 0.579 Yes Yes 14 50 F 105 106 110 0.584 0.579 Yes Yes 2 18 H 446 451 469 0.499 0.490 No No 4 18 H 446 451 469 0.499 0.490 No No 8 18 H 446 451 469 0.499 0.490 No No 9 18 H 446 451 469 0.499 0.490 No No 10 50 H 420 424 441 0.505 0.497 Yes No 11 50 H 420 424 441 0.505 0.497 Yes No 13 50 H 420 424 441 0.505 0.497 Yes No 15 50 H 420 424 441 0.505 0.497 Yes No Now we are able to determine the best strategy to adopt and the insurance price to offer depending on the prospect customer profile. The expected profit is equal to: Π(A) = j=15∑ j=1 (Pj,A −LPj)×1j,A, for strategy A 6 or Π(B) = j=15∑ j=1 (Pj,B −LPj)×1j,B, for strategy B. Here 1j,A = 1 if the prospect j accepts the insurance price PA and 0 otherwise. Simi- larly, 1j,B = 1 if the prospect j accepts the insurance price PB and 0 otherwise. Prospect_ID LPj Pj,A Pj,B Pj,A −LPj Pj,B −LPj 1j,A 1j,B (Pj,A −LPj )× 1j,A (Pj,B −LPj )× 1j,B 1 315 318 331 3 13 1 1 3 13 3 315 318 331 3 13 1 1 3 13 5 315 318 331 3 13 1 1 3 13 6 315 318 331 3 13 1 1 3 13 7 315 318 331 3 13 1 1 3 13 12 105 106 110 1 4 1 1 1 4 14 105 106 110 1 4 1 1 1 4 2 446 451 469 5 18 0 0 0 0 4 446 451 469 5 18 0 0 0 0 8 446 451 469 5 18 0 0 0 0 9 446 451 469 5 18 0 0 0 0 10 420 424 441 4 17 1 0 4 0 11 420 424 441 4 17 1 0 4 0 13 420 424 441 4 17 1 0 4 0 15 420 424 441 4 17 1 0 4 0 Sum 11 7 33 73 The best insurance price strategy is PB with a total profit of $73 and a new business volume of 7. 2.2.3 The Final Price table We determined the best price strategy to adopt to maximise the profit regarding to the risk of each profile, under the constraints and recommandations of managers and the help of other departments in the insurance company. Besides a report to sum up your statistical analysis, your work ouput (a datatable) is the final insurance price table depending on each profile (i.e. combination of prospects’ characteristics). This insurance price table will be used to determine which price to propose to each new prospects. Example – Step 5: In our exemple the final output correspond to the table bellow: Age Gender Final Price 18 H 469 50 H 441 18 F 331 50 F 110 7 2.3 Others comments Of course an insurance company has more data and this is the reason why SAS is es- sential to manage big database. The following sections of this document will explain you, step by step, how to do it using SAS! Good luck and Enjoy! 3 Step by Step Project Instruction 3.1 Computation of future costs In this section you will compute the frequency and severity for the different profile based on previous claims data of the company. As a reminder, for each new database it is important to use the proc contents procedure to have database information. Since 2015, the company keep all the information about the insureds in their portefolio and the claims that occured. You have at your disposal two database from the historical data of the company: • ptf.sas7bdat: corresponds to the portfolio of clients of the company from 2015 to 2019 and their characteristics: cars type, birthdate, zip code, their policyholder id and the ongoing contract starting date (i.e. a policyholder can be present sev- eral time in the portfolio database if it stay several years in the insurance com- pany but will have a different ongoing contract date). • claims.sas7bdat: corresponds to each claim registered by the company from 2015 to 2019 and includes: the policyholder id, the ongoing contract date and the cost of the claim. 3.1.1 Cohorts’ profile The company want to create cohorts according to the following information: – age: the age of the client with the following level : age ≤ 21: “-21”, 21 < age ≤ 35: "21-35", 35 < age ≤ 60: "35-60" and 60 < age: "+60"; - hp: the horsepower of the car with the following level: horsepower ≤ 150: "low", 150 < horsepower ≤ 300: "medium", 300 < horsepower : "high"; - density: the density of the ZIP code where is living with the following level: population ≤ 4000: "low", 4000 < population ≤ 30000: "medium", 30000 < population : "high"; Therefore, a profile correspond to the combinaison of these three variables. All in all there are 36 different profiles: 4(age)× 3(hp)× 3(density) = 36. Instruction 1: Import the ptf.sas7bdat database using the LIBNAME procedure with "data" as the name of the path. Using the information of the data.ptf database, compute the age of each policyholder and create the character variable ’age’ (with the informat 8 $15.) according to levels described before. The output should contain all the previous variable already exisiting in data.ptf database and the new variable age. The output is work.ptf: it is a database stored in the temporary library "work" and named ptf. Hint: Be carefull, for each row we want to have the age of the policyholder with respect to the date when the contract was ongoing. The if statement if the simplest way to create the variable with the different level of age. In addition, insurers often use external data to improve their statistical analysis. CA SUN INSURANCE, uses two different external database: - cars.csv: each row includes a cars type ("id" column) and its different features (in- cluding horsepower). - CA_ZIP_CODE.TXT: each row includes a californian zip code ("zip_code" column) and the corresponding population ("population" column). Instruction 2: Import the cars.csv in sas as work.cars using Proc import procedure and create the character variable "hp" in the work.cars database (with the informat $15.) according to levels decribed before. Merge it with the work.ptf database accord- ing to the "id_cars" variable. The output is again the work.ptf database. Instruction 3: Import the CA_ZIP_CODE.TXT in sas as work.CA_ZIP_CODE using Proc import procedure and create the character variable "density" in work.cars (with the informat $15.) according to levels decribed before. Merge it with the work.ptf database according to the "Zip_code" variable. The output is again the work.ptf database. For each row of the ptf database you have now the three characteristics variables in order to define the policyholder profile. 3.1.2 Computation of the loaded premium You have to determine the profile of policyholders having a claim. Instruction 4: Import the claims.sas7bdat database using the LIBNAME procedure with "data" as the name of the path. Merge the data.claims and the work.ptf by "policy- holder_id" and "policy_starting_date" to recover the characterics variable of each pol- icyholder having a claim for each year of contract. The ouput should be a database named "claims" and stored in the temporary library work. Hint: Be carefull we want to have in the output database only the rows present on the claims database. We now want to know the number of claims and the average cost of claim (severity) for each profile and for each contract year. Instruction 5: Create a variable "year" in the work.claims database corresponding to the year of the ongoing contract and create a database work.claims_summary includ- ing the number of claims and the average cost of claims for each year and profile. The average cost of claim variable should be named "cost" and the number of claims vari- able should be named "nb_claims". 9 Hint: the PROC MEANS procedure is the easiest way to do it (do not forget the output out statement). You need to know the number of policyholder on in each cohort and for each year in order to determine the frequency of claim for each p rofile and for each year. Instruction 6: Create a variable "year" in the work.ptf database corresponding to the year of the ongoing contract and create a database work.ptf_summary including the number of policyholders for each year and profile. The number of policyholder vari- able should be named "nb". Hint: the PROC MEANS procedure is the simplest way to do it (do not forget the output out statement). Now you have to compute the frequency of claim for each year and profile. Instruction 7: Create a work.summary database corresponding to the merge between work.claims_summary and work.ptf_summary by "age", "density", "population" and "year" and compute the frequency of claims for each years and profile using variables "nb_claims" and "nb". The frequency variable is named "freq". For now, you have analyzed the frequency and the average cost of claim for each profile and for each year of contract. Indeed, to define frequency to consider (over the years) for the computation of the pure premium, the manager asks you to be "conservative" and keep for each profile, the year where the frequency is the highest. Instruction 8: Create a work.freq database corresponding to the highest frequency for each profile. Keep on this database only the "age", "density", "hp" and "freq" vari- ables. Hint: the PROC SORT procedure is the simplest way to do it (do not forget the nodupkey and be carefull on the order of by variables). Similarly, to define averale claim costs to consider (over the years) for the computa- tion of the pure premium, the manager asks you to be "conservative" and keep for each profile, the year where the cost is the highest. Instruction 9: Create a work.cost database corresponding to the highest average cost of claims for each profile. Keep on this database only the "age", "density", "hp" and "cost" variables. Hint: the PROC SORT procedure is the simplest way to do it (do not forget the nodupkey and be carefull on the order of by variables). Now you can easely determine the pure premium since you have for each profile the cost and frequency of claim. Instruction 10: Create a work.pp database corresponding to the merge of the work.cost and work.freq database by "age", "density" and "hp" variables. Create a new variable "pp" corresponding to the pure premium. Hint: the "pp" database should have 36 observations corresponding to the 36 possible 10 profiles. Instruction 11: Create a new variable "lp" in the work.pp database corresponding to the loaded premium according to the following information given by the accounting department: For a claim cost of $500 we also have additional handling costs of $25. We thus estimate the loading factor at 5% = 25/500 3.2 Definition of the Best Insurance Price Strategy Your managers want to use one of the following strategies: - Strategy A: loading factor equal to 5%; - Strategy B: loading factor equal to 10%; - Strategy C: loading factor equal to 15%; and they need you to determine which want generate the highest profit. 3.2.1 Define the quote received for each strategy and for each prospect Instruction 12: Create a new database work.price using the work.pp database. In this database keep only the "age", "density", "hp", "lp" variables and create three new vari- ables "pA", "pB", "pC", corresponding respectively to the price of strategy A, B and C . From the example at the beginning of this document, we know that the price is not sufficient to determine the profit generated by a stratgegy. You receive the following information from the marketing department: Please find below our estimated function and the database of prospect clients for 2020 (Prospect.csv) P rob(Accept|Pj) = 11 + exp(f (Pj)) with f (Pj) = −0.1× LPj Pj + 0.002× (Pj −LPj) we consider that a prospect becomes a new business client if P rob(Accept|Pj) > 0.5. Instruction 13: Import the Prospect.csv database given by the marketing department as work.prospect using Proc import procedure Similarly to Instruction 1, 2 and 3, re- covered the three characteristics to determine the profile of each prospect. The output should be a database stored in the temporary work library named prospect. Hint: the work.cars and work.CA_ZIP_CODE database are already existing. Instruction 14: Merge the work.prospect database with the work.price database by “age”, “density” and “hp” in order to determine the price that they will receive for each strategy A, B and C. The output should be the work.prospect database. 11 3.2.2 Compute the profit and volume generated by each strategy Instruction 15: Create for each row three new variable “prob_a”, “prob_b” and “prob_c” in work.prospect database, corresponding to the probability that the prospect receiv- ing respectively the price “pa”, “pb” and “pc” accepts the offer. Hint: of course you need to take into account the information given by the marketing department. Instruction 16: Create for each row three new variables “accept_a”, “accept_b” and “accept_c” in work.prospect database equal to 1 if the prospect accepts respectively the price “pa”, “pb” and “pc” and 0 otherwise. Hint: the if statement is the simplest way to do it. Instruction 17: Create a new database work.table_volume corresponding to the vol- ume of new business for each strategy. Hint: the Proc MEANS is the simplest way to do it. Instruction 18: Compute the profit for each row of the work.prospect database and call the variable respectively “pi_a”, “pi_b” and “pi_c” in the work.prospect database for strategy a, b and c. Instruction 19: Create a new database work.table_profit corresponding to the profit for each strategy. Hint: the Proc MEANS is the simplest way to do it. You are now able to determine the best strategy to implement to maximise the profit for the new business in 2020 of this motor insurance company! 3.3 Create a report You job is almost finished but you need to provide the key elements of your reasoning and the final output (i.e. the table of the insurance price by profile that you recom- mand in order to optimize the new business profit). Your manager ask you to create a pdf report with the following information (in this order): • On the top of each page your name and ucsb email adress • A summary table with the average claim frequency (used in the pp) by age of the current portfolio • A summary table with the average claim frequency (used in the pp) by density of the current portfolio • A summary table with the average claim frequency (used in the pp) by horse power of the current portfolio • A summary table with the average claim costs (used in the pp) by age of the current portfolio 12 • A summary table with the average claim costs (used in the pp) by density of the current portfolio • A summary table with the average claim costs (used in the pp) by horse power of the current portfolio • A summary table with the average pure premium of the current portfolio • The profile with the lowest pure premium • The profile with the highest pure premium • A summary table with the volume of new business of each strategy • A summary table with the total profit of each strategy • The final pricing strategy table Instruction 20: Create a pdf report using ODS procedure named “final_report_lastnames.pdf”. To avoid to have one page for each output (corresponding to each procedure in the STATEMENT>) and to add your name at the top of each page use the following ODS op- tions: options nodate pdfpageview=FITPAGE; ods NOPROCTITLE; ODS PDF style=journal File=”…final_report_lastname.pdf” startpage=no; title1 “Homework Project PSTAT130”; title2 “Your names; title3 “Your email adresses”; ods layout gridded; ods layout end; ODS PDF CLOSE; This ODS procedure can generate a warning in the log windows : “WARNING: Layout is not supported in the RTF(WEB) destination. Layout will be ignored.” do not take it into account. Complete the with the following instruc- tions. Instruction 21: Create a work.ptf_pp database by merging the work.ptf database and the work.pp database previously created. Keep only the current portefolio (i.e. the policy ongoing date equal to 2019). Instruction 22: Using Proc mean procedure on the work.ptf_pp databa
se, display three consecutive tables for the average claim frequency by age, density and horse- power. 13 Hint: do not forget to add a title to each procedure Instruction 23: Using Proc mean procedure on the work.ptf_pp database, display three consecutive tables for the average claim costs by age, density and horsepower. Hint: do not forget to add a title to each procedure Instruction 24: Using Proc mean procedure on the work.ptf_pp database, display the average pure premium of the current portfolio. Instruction 25: Display only the three variables of the profile with the lowest pure premium. Hint: Use Proc sort procedure and proc print procedure with obs option. Instruction 26: Display only the three variables of the profile with the highest pure premium. Hint: Use Proc sort procedure and proc print procedure with obs option. Instruction 27: Using the database work.table_volume previsouly created , display the volume of new business generated by each strategy. Add a temporary label to re- place the name column with Strategy A, Strategy B and Strategy C. Hint: Use Proc print procedure Instruction 28: Using the database work.table_profit previsouly created , display the profit generated by each strategy. Add a temporary label to replace the name column with Strategy A, Strategy B and Strategy C. Add a footnote to give the best strategy chosen. Hint: Use Proc print procedure Instruction 29: Create a database work.final_price using “work.price database. Keep only the characteristics variables used to determine profiles and rename the price strat- egy selected by “best_price”. Print this database without the observation index . 14

admin

Author admin

More posts by admin