<- suppressPackageStartupMessages
sh sh(library(tidyverse))
sh(library(caret))
sh(library(class))
sh(library(ISLR)) # for the "College" dataframe
Setup
- Setup
Dataframe
- We use the
College
dataframe.
head(College)
Private Apps Accept Enroll Top10perc Top25perc
Abilene Christian University Yes 1660 1232 721 23 52
Adelphi University Yes 2186 1924 512 16 29
Adrian College Yes 1428 1097 336 22 50
Agnes Scott College Yes 417 349 137 60 89
Alaska Pacific University Yes 193 146 55 16 44
Albertson College Yes 587 479 158 38 62
F.Undergrad P.Undergrad Outstate Room.Board Books
Abilene Christian University 2885 537 7440 3300 450
Adelphi University 2683 1227 12280 6450 750
Adrian College 1036 99 11250 3750 400
Agnes Scott College 510 63 12960 5450 450
Alaska Pacific University 249 869 7560 4120 800
Albertson College 678 41 13500 3335 500
Personal PhD Terminal S.F.Ratio perc.alumni Expend
Abilene Christian University 2200 70 78 18.1 12 7041
Adelphi University 1500 29 30 12.2 16 10527
Adrian College 1165 53 66 12.9 30 8735
Agnes Scott College 875 92 97 7.7 37 19016
Alaska Pacific University 1500 76 72 11.9 2 10922
Albertson College 675 67 73 9.4 11 9727
Grad.Rate
Abilene Christian University 60
Adelphi University 56
Adrian College 54
Agnes Scott College 59
Alaska Pacific University 15
Albertson College 55
- States the ISLR textbook:
Name | Description |
---|---|
Private |
Public/private indicator |
Apps |
Number of applications received |
Accept |
Number of applicants accepted |
Enroll |
Number of new students enrolled |
Top10perc |
New students from top 10 % of high school class |
Top25perc |
New students from top 25 % of high school class |
F.Undergrad |
Number of full-time undergraduates |
P.Undergrad |
Number of part-time undergraduates |
Outstate |
Out-of-state tuition |
Room.Board |
Room and board costs |
Books |
Estimated book costs |
Personal |
Estimated personal spending |
PhD |
Percent of faculty with Ph.D.’s |
Terminal |
Percent of faculty with terminal degree |
S.F.Ratio |
Student/faculty ratio |
perc.alumni |
Percent of alumni who donate |
Expend |
Instructional expenditure per student |
Grad.Rate |
Graduation rate |
Multiple Regression
- Run a linear regression model with
Grad.Rate
as the dependent variable andPhD
andExpend
as features (variables).- Regard
PhD
andExpend
as two forms of investment in education - in training for instructors, and in resources for students.
- Regard
- Compute and comment on the RMSE.
# Your code here
TODO: Explain
Feature Engineering
- Create 6+ total features. Consider:
- Attributes of the student body.
- For example, an acceptance rate, or a percentages of students in other categories vs. accepted/enrolled.
- Costs of the university.
- Some other category, such as related to success, alumni, or faculty.
- Attributes of the student body.
- Remove all rows with a missing value.
- Ensure only
Grad.Rate
and the engineered features remain. - Compute and comment on the RMSE.
# Your code here
TODO: Explain
Naive Classification
- Use either of \(K\)-NN or Naive Bayes to predict whether a college is
Private
. - Explain your choice of technique.
- Report on your Kappa value.
# Your code here
TODO: Explain
Improved Classification
- Predict whether a college is
Private
. - Use model weights.
- Display and comment on an ROC curve.
# Your code here
TODO: Explain
Ethics
- Based on your analysis, comment on the for-profit privatization of education, perhaps through the framework advanced by this article:
- Discuss the civic reposibilities of data scientists for:
- Big Data and Human-Centered Computing
- Democratic Institutions
- Education and Educational Policy
- Provide at least one statistical measure for each, such as a RMSE, Kappa value, or ROC curve.
Big Data and Human-Centered Computing
TODO: Big Data and Human-Centered Computing
# Your code here
Democratic Institutions
TODO: Democratic Institutions
# Your code here
Education and Educational Policy
TODO: Education and Educational Policy
# Your code here