Abstract:

This is a technical blog post of both an HTML file and .qmd file hosted on GitHub pages.

Setup

Change the author of this RMD file to be yourself and delete this line.
Modify if necessary the below code so that you can successfully load wine.rds then delete this line.
In the space provided after the R chunk, explain what thecode is doing (line by line) then delete this line.
Get your GitHub Pages ready.

Step Up Code:

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(caret)

Loading required package: lattice

Attaching package: 'caret'

The following object is masked from 'package:purrr':

    lift

library(fastDummies)
wine <- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/master/dat/wine.rds")))

Explanataion:

TODO: write your line-by-line explanation of the code here

Feature Engineering

We begin by engineering an number of features.

Create a total of 10 features (including points).
Remove all rows with a missing value.
Ensure only log(price) and engineering features are the only columns that remain in the wino dataframe.

wino <- wine %>% 
  mutate(lprice=log(price))
  # engineer features here

Caret

We now use a train/test split to evaluate the features.

Use the Caret library to partition the wino dataframe into an 80/20 split.
Run a linear regression with bootstrap resampling.
Report RMSE on the test partition of the data.

# TODO: hint: Check the slides.

Variable selection

We now graph the importance of your 10 features.

# TODO: hint: Check the slides.