<- suppressPackageStartupMessages
sh sh(library(tidyverse))
sh(library(caret))
sh(library(naivebayes))
<- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/master/dat/pinot.rds"))) wine
Abstract:
This is a technical blog post of both an HTML file and .qmd file hosted on GitHub pages.
0. Quarto Type-setting
- This document is rendered with Quarto, and configured to embed an images using the
embed-resources
option in the header. - If you wish to use a similar header, here’s is the format specification for this document:
format:
html: embed-resources: true
1. Setup
Step Up Code:
2. Logistic Concepts
Why do we call it Logistic Regression even though we are using the technique for classification?
TODO: Explain.
3. Modeling
We train a logistic regression algorithm to classify a whether a wine comes from Marlborough using:
- An 80-20 train-test split.
- Three features engineered from the description
- 5-fold cross validation.
We report Kappa after using the model to predict provinces in the holdout sample.
# TODO
4. Binary vs Other Classification
What is the difference between determining some form of classification through logistic regression versus methods like \(K\)-NN and Naive Bayes which performed classifications.
TODO: Explain.
5. ROC Curves
We can display an ROC for the model to explain your model’s quality.
# You can find a tutorial on ROC curves here: https://towardsdatascience.com/understanding-the-roc-curve-and-auc-dd4f9a192ecb/
TODO: Explain.