Classification

Author

Your name here!

Published

February 24, 2025

Abstract:

This is a technical blog post of both an HTML file and .qmd file hosted on GitHub pages.

0. Quarto Type-setting

  • This document is rendered with Quarto, and configured to embed an images using the embed-resources option in the header.
  • If you wish to use a similar header, here’s is the format specification for this document:

1. Setup

Step Up Code:

sh <- suppressPackageStartupMessages
sh(library(tidyverse))
sh(library(caret))
sh(library(naivebayes))
wine <- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/master/dat/pinot.rds")))

2. Logistic Concepts

Why do we call it Logistic Regression even though we are using the technique for classification?

TODO: Explain.

3. Modeling

We train a logistic regression algorithm to classify a whether a wine comes from Marlborough using:

  1. An 80-20 train-test split.
  2. Three features engineered from the description
  3. 5-fold cross validation.

We report Kappa after using the model to predict provinces in the holdout sample.

# TODO

4. Binary vs Other Classification

What is the difference between determining some form of classification through logistic regression versus methods like \(K\)-NN and Naive Bayes which performed classifications.

TODO: Explain.

5. ROC Curves

We can display an ROC for the model to explain your model’s quality.

# You can find a tutorial on ROC curves here: https://towardsdatascience.com/understanding-the-roc-curve-and-auc-dd4f9a192ecb/

TODO: Explain.