Goal
This is a generalized ML assignment
- Predict the profit of future products developed at CravenSpeed
- Using any model (or ensemble) you’d like
- Evaluated using RMSE on holdout sample (you won’t see)
Submission
- A maximum of 15 slides, without any code, that demonstrates:
- How and why you created/selected the features used,
- The choice and design of your model, and
- Results and insights.
- You must present from the Ford 102 “Teaching Machine” with no login from a public url.
Presentations
- It is a simple matter to create a presentation within Quarto.
- Simply specify “revealjs” format. Read more
The page
final_page.qmd
---
: "Final Page"
title: "Team $i$"
author: "04/21/2025"
date---
# Goal
...
The presentation
final_present.qmd
---
: "Final Presentation"
title: "Team $i$"
author: "04/21/2025"
date: revealjs
format---
# Goal
...
Criteria
- Every group member must participate in the presentation
- Maximum 10 features including interactions
Setup
- You may use any libraries, but
tidyverse
andcaret
may be sufficient.- If you wish, you may use Python, Julia, or Observable in any manner you see fit and I will figure out how to assess it.
- Recall - no code on slides! So it won’t matter.
library(tidyverse)
library(caret)
Dataframe
- We use the
craven_train
dataframe.
<- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/refs/heads/master/dat/craven_train.rds"))) fast
- You will necessarily perform some feature engineering as you see fit.
- Exactly ten (10) features.
- No relation to “Crazy Train”
An “engineer” function
- Besides the presentation
- Submit a .qmd or .rmd file that includes an “engineer” function
- It engineers your features over a data frame with the same columns as “craven_train.rds”.
<- readRDS("secret.rds") # I have "secret" data
fast <- fast %>% engineer() # I will apply your function. fast
A bad example
# Engineer 10 features
<- function(df) {
engineer |> select(1:10)
df }
Setup
- Assessments will be setup as follows:
- “Profit” is engineered.
- Note that the first five (10) features are selected.
- This ensures no more than 10 features are used.
- “Profit” is incorporated into the data frame.
<- fast["Revenue 2019 to present"] - fast["BOM Cost"] * fast["Units Sold"]
profit <- fast %>% engineer()
fast <- fast |> select(1:10) # Max 10 features
fast "Profit"] = profit fast[
Assessment
- Assessments will be evaluated via RMSE over the secret data as follows:
train(Profit ~ .,
data = fast,
method = "lm",
trControl = trainControl(method = "cv", number = 5))$results$RMSE