Select Page

STATS 200 Alabama A & M University Heart Disease Questions

Question Description

A health insurance company collected information on 788 of its subscribers who had made claimsresulting from ischemic (coronary) heart disease. Data were obtained on total costs of services providedfor these 788 subscribers and the nature of the various services for a certain time period. The variablesconsist of the following:

1. ID: Identifying number 1-788

2. Total Claim Cost: Total cost of claims by subscriber (dollars)

3. Age: Age of subscriber (years)

4. Gender: Gender of subscriber: 1 if they identify as male, 2 if they identify as female, and 3otherwise.

5. Interventions: Total number of interventions or procedures carried out

6. Drugs: Number of tracked drugs prescribed

7. ER Visits: Number of emergency room visits

8. Complications: Number of other complications that arose during heart disease treatment

9. Comorbidities: Number of other diseases that the subscriber had during period

10. Duration: Number of days of duration of treatment conditionWe are going to perform various bits of data management and analysis on this data set. Answer each ofthe following questions. When you are done, upload your R Script, your datasets, and pdfs of yourgraphs to Blackboard.

Part 1: Preparation

1. Include a setwd() command to set the directory to where you saved the .csv file.

2. Upload the csv file from your computer into R.

3. Get a summary of all the variables to determine if there are any issues. If there are issuesidentify how you will handle them.

4. There are issues with NA’s. Change the NA values to 0. Describe whether or not you think this isok to do.

5. The gender variable is not immediately obvious.a. Replace the values with Male for 1, Female for 2, and Otherwise for 3.b. Verify that the variable is a factor and if not, change it to one.

6. Create a new variable called Age Category which has the value Young Adult if the person isunder 40 years old, Middle Aged if the person is between 40 and 59 years old, and Older if theperson is 60 years or older.Part 2: Numerical Computations

7. Determine the maximum and minimum for Total Claim Cost.

8. Determine the total amount of Total Claim Cost for the entire data set.

9. Compute the mean and standard deviation of Total Claim Cost.

10. Use the table() function to create a table and get a distribution of gender.

11. Divide the table by the length of the gender vector to get relative frequencies.

12. Compute Q1 and Q3 for Age, Interventions and ER Visits.

13. For each of those variables compute the Lower Fence = Q1-1.5IQR andthe Upper Fence = Q3+1.5IQR.

14. Create three separate data frames that contain the outliers (if any) for Age, for Interventions,and for ER Visits.

15. Compute the max, min, mean, and standard deviation of Total Claim Cost for each of those dataframes and compare to the overall values computed in 7 and 9.

16. Compute the distribution of Gender for each of those data frames and compare to the overallvalues computed in 11.

17. For the original data frame count the number of people using at least one drug.

18. For the original data frame create a new variable called Interactions which is the sum ofInterventions and ER Visits.

Part 3: Graphs – Create each of the following graphs AND describe its features. Each graph should beappropriately labeled.19. Create a barplot of Age Category and describe it

20. Create a histogram of ER Visits and describe it.

21. Create a boxplot of Interventions and describe it.

File Ischemic Heart Disease.csv

"Place your order now for a similar assignment and have exceptional work written by our team of experts, guaranteeing you "A" results."

Order Solution Now