Regression
AI 101
First Things First
Takeaways - Colab
- Use
+ - Use
= - Use “run all”
Takeaways - Gemini
- Be specific
- Be precise
- Be consistent
Takeaways - Both
- Your “chatbot” will persist as long as you keep the window open
- If you come back latter or manually restart it will change
- Some things should be repeated, others don’t need to be!
- I write a “prefix” I use on all my prompts, and provide it each time.
Regression
Infographic by Dasani Madipalli
Why Regression?
- Not, in my view, an AI topic.
- Not, in my view, really an ML topic.
- It seems commonly to be considered an ML topic.
- It is the basis for some more advanced ML/AI stuff.
- We’ll get it out of the way quickly then move on.
What is Regression?
What is Regression?
In statistical modeling, regression analysis is a statistical method for estimating the relationship between a dependent variable (often called the outcome or response variable, or a label in machine learning parlance) and one or more independent variables (often called regressors, predictors, covariates, explanatory variables or features).
Shorter
- Given some predictor, determine an outcome.
- Given hours studied, determine exam grade.
- Given square footage, determine home price.
- Given parental income, determine income.
Two Types
- Linear
- The most common type
- Relate a numeric to numeric
- Years of schooling to annual salary
- Logistic
- Special type
- Relate a numeric to true/false
- Years of school vs. employed (or not)
Machine learning
- The real power of machine learning (ML) comes from training models.
- Models are trained on historic data to automatically capture data dependencies.
- They allow you to predict outcomes for new data.
ML
Prediction is hard, especially when it is about the future.
- The core insight of ML is the ability to infer (or approximate) new knowledge, not describe existing, known facts.
Linear Regression
The Goal
- The goal of a linear regression is to be able to plot a line, which…
- Shows the relationship between the variables.
- The predictor(s)
- The outcome
- Makes predictions.
- What would we expect the GDP of a nation-state of 700 million people to be?
Least Squares
- By social convention…
- That is, not by mathematical law, but by historical accident…
- …linear relationships are predicted using least-squares regression.
Caution!
Look out!
Least squares and linear regression are related, but distinct. 1. Linear regression is a way to solution to a type of meaningful research question. 2. Least squares is a solution to a system of equations.
- There are other forms of numeric-to-numeric regression (e.g. polynomial).
Minimal Example
- Take some data points and a candidate line.
Square it
- Take the distance data-to-candidate and square it.
In Colab
- It is possible to show this in Colab/Python.
- I will not quiz you on this but it’s cool.
- Let’s take Star Wars episodes vs. year.
- I made this by hand.
| Episode | Year |
|---|---|
| 1 | 1999 |
| 2 | 2002 |
| 3 | 2005 |
| 4 | 1977 |
| 5 | 1980 |
| 6 | 1983 |
| 7 | 2015 |
| 8 | 2017 |
| 9 | 2019 |
Colab
- I just asked Gemini to make this.
Plot it
- I naively make a scatterplot to view the data…
Arithmetic
- Arithmetic is hard, so just use a built-in function.
Predict
- What year do we expect episode 10 to come out?
- Whoops.
Look at it
- Do you see… any other patterns?
Non-linearity
| Episode | Year |
|---|---|
| 4 | 1977 |
| 5 | 1980 |
| 6 | 1983 |
| Episode | Year |
|---|---|
| 1 | 1999 |
| 2 | 2002 |
| 3 | 2005 |
| Episode | Year |
|---|---|
| 7 | 2015 |
| 8 | 2017 |
| 9 | 2019 |
I see…
- Within trilogies, every 2-3 years, decreasing over time.
- Between triloigies, every 16-10 years, decreasing over time.
- Prequels before sequels, often.
- I’d say episode 10
- Next trilogy in ~10 years after 2019.
- 2029ish if there’s another prequel (High Republic) or 2044ish
Complication
- Should we include Rogue One as:
- Between Episodes 3 and 4
- Released in 2016
Takeaways
- Linear regression can be cool.
- But it is not intelligent, probably.
- Perhaps, if it appears intelligence the site of the intelligence is actually the human user, not the computer.


