# TODO: hint: m1 <- lm(lprice ~ points + cherry)
Abstract:
This is a technical blog post of both an HTML file and .qmd file hosted on GitHub pages.
Setup
- Change the author of this RMD file to be yourself and delete this line.
- Modify if necessary the below code so that you can successfully load
wine.rds
then delete this line. - In the space provided after the R chunk, explain what thecode is doing (line by line) then delete this line.
- Get your GitHub Pages ready.
Step Up Code:
library(tidyverse) # change r to {r} to run this block, then remove this comment
<- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/master/dat/wine.rds"))) %>%
wine filter(province=="Oregon" | province=="California" | province=="New York") %>%
mutate(cherry=as.integer(str_detect(description,"[Cc]herry"))) %>%
mutate(lprice=log(price)) %>%
select(lprice, points, cherry, province)
Explanataion:
TODO: write your line-by-line explanation of the code here
Multiple Regression
Linear Models
First run a linear regression model with log of price as the dependent variable and ‘points’ and ‘cherry’ as features (variables).
Explanataion:
TODO: write your line-by-line explanation of the code here
TODO: report and explain the RMSE
Interaction Models
Add an interaction between ‘points’ and ‘cherry’.
# TODO: hint: Check the slides.
TODO: write your line-by-line explanation of the code here
TODO: report and explain the RMSE
The Interaction Variable
TODO: interpret the coefficient on the interaction variable.
Explain as you would to a non-technical manager.
Applications
Determine which province (Oregon, California, or New York), does the ‘cherry’ feature in the data affect price most?
# TODO:
TODO: write your line-by-line explanation of the code here, and explain your answer.
Scenarios
On Accuracy
Imagine a model to distinguish New York wines from those in California and Oregon. After a few days of work, you take some measurements and note: “I’ve achieved 91% accuracy on my model!”
Should you be impressed? Why or why not?
# TODO: Use simple descriptive statistics from the data to justify your answer.
TODO: describe your reasoning here
On Ethics
Why is understanding this vignette important to use machine learning in an ethical manner?
TODO: describe your reasoning here
Ignorance is no excuse
Imagine you are working on a model to predict the likelihood that an individual loses their job as the result of the changing federal policy under new presidential administrations. You have a very large dataset with many hundreds of features, but you are worried that including indicators like age, income or gender might pose some ethical problems. When you discuss these concerns with your boss, she tells you to simply drop those features from the model. Does this solve the ethical issue? Why or why not?
TODO: describe your reasoning here