Characterizing Colleges

Author

Practice

Published

March 10, 2025

Setup

  • Setup
sh <- suppressPackageStartupMessages
sh(library(tidyverse))
sh(library(caret))
sh(library(class))
sh(library(ISLR)) # for the "College" dataframe

Dataframe

  • We use the College dataframe.
head(College)
                             Private Apps Accept Enroll Top10perc Top25perc
Abilene Christian University     Yes 1660   1232    721        23        52
Adelphi University               Yes 2186   1924    512        16        29
Adrian College                   Yes 1428   1097    336        22        50
Agnes Scott College              Yes  417    349    137        60        89
Alaska Pacific University        Yes  193    146     55        16        44
Albertson College                Yes  587    479    158        38        62
                             F.Undergrad P.Undergrad Outstate Room.Board Books
Abilene Christian University        2885         537     7440       3300   450
Adelphi University                  2683        1227    12280       6450   750
Adrian College                      1036          99    11250       3750   400
Agnes Scott College                  510          63    12960       5450   450
Alaska Pacific University            249         869     7560       4120   800
Albertson College                    678          41    13500       3335   500
                             Personal PhD Terminal S.F.Ratio perc.alumni Expend
Abilene Christian University     2200  70       78      18.1          12   7041
Adelphi University               1500  29       30      12.2          16  10527
Adrian College                   1165  53       66      12.9          30   8735
Agnes Scott College               875  92       97       7.7          37  19016
Alaska Pacific University        1500  76       72      11.9           2  10922
Albertson College                 675  67       73       9.4          11   9727
                             Grad.Rate
Abilene Christian University        60
Adelphi University                  56
Adrian College                      54
Agnes Scott College                 59
Alaska Pacific University           15
Albertson College                   55
  • States the ISLR textbook:
Name Description
Private Public/private indicator
Apps Number of applications received
Accept Number of applicants accepted
Enroll Number of new students enrolled
Top10perc New students from top 10 % of high school class
Top25perc New students from top 25 % of high school class
F.Undergrad Number of full-time undergraduates
P.Undergrad Number of part-time undergraduates
Outstate Out-of-state tuition
Room.Board Room and board costs
Books Estimated book costs
Personal Estimated personal spending
PhD Percent of faculty with Ph.D.’s
Terminal Percent of faculty with terminal degree
S.F.Ratio Student/faculty ratio
perc.alumni Percent of alumni who donate
Expend Instructional expenditure per student
Grad.Rate Graduation rate

Multiple Regression

  • Run a linear regression model with Grad.Rate as the dependent variable and PhD and Expend as features (variables).
    • Regard PhD and Expend as two forms of investment in education - in training for instructors, and in resources for students.
  • Compute and comment on the RMSE.
# Your code here

TODO: Explain

Feature Engineering

  • Create 6+ total features. Consider:
    • Attributes of the student body.
      • For example, an acceptance rate, or a percentages of students in other categories vs. accepted/enrolled.
    • Costs of the university.
    • Some other category, such as related to success, alumni, or faculty.
  • Remove all rows with a missing value.
  • Ensure only Grad.Rate and the engineered features remain.
  • Compute and comment on the RMSE.
# Your code here

TODO: Explain

Naive Classification

  • Use either of \(K\)-NN or Naive Bayes to predict whether a college is Private.
  • Explain your choice of technique.
  • Report on your Kappa value.
# Your code here

TODO: Explain

Improved Classification

  • Predict whether a college is Private.
  • Use model weights.
  • Display and comment on an ROC curve.
# Your code here

TODO: Explain

Ethics

  • Based on your analysis, comment on the for-profit privatization of education, perhaps through the framework advanced by this article:

In mid-May 2018, The New York Times reported that under DeVos, the size of the team investigating abuses and fraud by for-profit colleges was reduced from about twelve members under the Obama administration to three, with their task also being scaled back to “processing student loan forgiveness applications and looking at smaller compliance cases”.

  • Discuss the civic reposibilities of data scientists for:
    • Big Data and Human-Centered Computing
    • Democratic Institutions
    • Education and Educational Policy
  • Provide at least one statistical measure for each, such as a RMSE, Kappa value, or ROC curve.

Big Data and Human-Centered Computing

TODO: Big Data and Human-Centered Computing

# Your code here

Democratic Institutions

TODO: Democratic Institutions

# Your code here

Education and Educational Policy

TODO: Education and Educational Policy

# Your code here