Abstract:
This is a technical blog post of both an HTML file and .qmd file hosted on GitHub pages.
Setup
- Change the author of this RMD file to be yourself and delete this line.
- Modify if necessary the below code so that you can successfully load
wine.rds
then delete this line.
- In the space provided after the R chunk, explain what thecode is doing (line by line) then delete this line.
- Get your GitHub Pages ready.
Step Up Code:
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: lattice
Attaching package: 'caret'
The following object is masked from 'package:purrr':
lift
library(fastDummies)
wine <- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/master/dat/wine.rds")))
Explanataion:
TODO: write your line-by-line explanation of the code here
Feature Engineering
We begin by engineering an number of features.
- Create a total of 10 features (including points).
- Remove all rows with a missing value.
- Ensure only log(price) and engineering features are the only columns that remain in the
wino
dataframe.
wino <- wine %>%
mutate(lprice=log(price))
# engineer features here
Caret
We now use a train/test split to evaluate the features.
- Use the Caret library to partition the wino dataframe into an 80/20 split.
- Run a linear regression with bootstrap resampling.
- Report RMSE on the test partition of the data.
# TODO: hint: Check the slides.
Variable selection
We now graph the importance of your 10 features.
# TODO: hint: Check the slides.