<- suppressPackageStartupMessages
sh sh(library(tidyverse))
sh(library(caret))
<- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/master/dat/pinot.rds"))) wine
Abstract:
This is a technical blog post of both an HTML file and .qmd file hosted on GitHub pages.
0. Quarto Type-setting
- This document is rendered with Quarto, and configured to embed an images using the
embed-resources
option in the header. - If you wish to use a similar header, here’s is the format specification for this document:
format:
html: embed-resources: true
1. Setup
Step Up Code:
2. Conditional Probability
Calculate the probability that a Pinot comes from Burgundy given it has the word ‘fruit’ in the description.
\[ P({\rm Burgundy}~|~{\rm Fruit}) \]
# TODO
3. Naive Bayes Algorithm
We train a naive bayes algorithm to classify a wine’s province using: 1. An 80-20 train-test split. 2. Three features engineered from the description 3. 5-fold cross validation.
We report Kappa after using the model to predict provinces in the holdout sample.
# TODO
4. Frequency Differences
We find the three words that most distinguish New York Pinots from all other Pinots.
# TODO
5. Extension
Either do this as a bonus problem, or delete this section.
Calculate the variance of the logged word-frequency distributions for each province.