Regression

AI 101

First Things First

Takeaways - Colab

  • Use +
  • Use =
  • Use “run all”

Takeaways - Gemini

  • Be specific
  • Be precise
  • Be consistent

Takeaways - Both

  • Your “chatbot” will persist as long as you keep the window open
  • If you come back latter or manually restart it will change
  • Some things should be repeated, others don’t need to be!
    • I write a “prefix” I use on all my prompts, and provide it each time.

Regression

Linear vs polynomial regression infographic Infographic by Dasani Madipalli

Why Regression?

  • Not, in my view, an AI topic.
  • Not, in my view, really an ML topic.
  • It seems commonly to be considered an ML topic.
  • It is the basis for some more advanced ML/AI stuff.
  • We’ll get it out of the way quickly then move on.

What is Regression?

What is Regression?

In statistical modeling, regression analysis is a statistical method for estimating the relationship between a dependent variable (often called the outcome or response variable, or a label in machine learning parlance) and one or more independent variables (often called regressors, predictors, covariates, explanatory variables or features).

Shorter

  • Given some predictor, determine an outcome.
    • Given hours studied, determine exam grade.
    • Given square footage, determine home price.
    • Given parental income, determine income.

Two Types

  • Linear
    • The most common type
    • Relate a numeric to numeric
      • Years of schooling to annual salary
  • Logistic
    • Special type
    • Relate a numeric to true/false
      • Years of school vs. employed (or not)

Machine learning

  • The real power of machine learning (ML) comes from training models.
  • Models are trained on historic data to automatically capture data dependencies.
  • They allow you to predict outcomes for new data.

ML

Prediction is hard, especially when it is about the future.

  • The core insight of ML is the ability to infer (or approximate) new knowledge, not describe existing, known facts.

Linear Regression

The Goal

  • The goal of a linear regression is to be able to plot a line, which…
  • Shows the relationship between the variables.
    • The predictor(s)
    • The outcome
  • Makes predictions.
    • What would we expect the GDP of a nation-state of 700 million people to be?

Least Squares

  • By social convention…
    • That is, not by mathematical law, but by historical accident…
  • …linear relationships are predicted using least-squares regression.

The first known publication of a ‘Method of Averages’ was by the German astronomer Tobias Mayer in 1750.

Caution!

Look out!

Least squares and linear regression are related, but distinct. 1. Linear regression is a way to solution to a type of meaningful research question. 2. Least squares is a solution to a system of equations.

  • There are other forms of numeric-to-numeric regression (e.g. polynomial).

Minimal Example

  • Take some data points and a candidate line.

Square it

  • Take the distance data-to-candidate and square it.

In Colab

  • It is possible to show this in Colab/Python.
  • I will not quiz you on this but it’s cool.
  • Let’s take Star Wars episodes vs. year.
    • I made this by hand.
Episode Year
1 1999
2 2002
3 2005
4 1977
5 1980
6 1983
7 2015
8 2017
9 2019

Colab

  • I just asked Gemini to make this.
eps = [4, 5, 6, 1, 2, 3, 7, 8, 9]
yrs = [1977, 1980, 1983, 1999, 2002, 2005, 2015, 2017, 2019]

Plot it

  • I naively make a scatterplot to view the data…
from matplotlib import pyplot as plt

plt.scatter(eps,yrs)

Arithmetic

import numpy as np

p = np.poly1d(np.polyfit(eps, yrs, 1))
plt.scatter(eps,yrs)
plt.plot(eps,p(eps))

Predict

  • What year do we expect episode 10 to come out?
import numpy as np

p(10)
np.float64(2012.2499999999995)
  • Whoops.

Look at it

  • Do you see… any other patterns?
plt.scatter(eps,yrs)

Non-linearity

Episode Year
4 1977
5 1980
6 1983
Episode Year
1 1999
2 2002
3 2005
Episode Year
7 2015
8 2017
9 2019

I see…

  • Within trilogies, every 2-3 years, decreasing over time.
  • Between triloigies, every 16-10 years, decreasing over time.
  • Prequels before sequels, often.
  • I’d say episode 10
    • Next trilogy in ~10 years after 2019.
    • 2029ish if there’s another prequel (High Republic) or 2044ish

Complication

  • Should we include Rogue One as:
    • Between Episodes 3 and 4
    • Released in 2016

Takeaways

  • Linear regression can be cool.
  • But it is not intelligent, probably.
    • Perhaps, if it appears intelligence the site of the intelligence is actually the human user, not the computer.