Welcome to the catalogue
The 3 flavors of ML and how each one shows up at Lincoln.
Watch this first β 85-second primer for what's below.
What this whole catalogue does
Every chapter of the textbook Introduction to Statistical Learning becomes a short, interactive crash course. We strip the math down to what you actually need, anchor every concept in a Lincoln Industries problem, and end with a working model you can defend in a meeting.
You'll travel from "speak the language" (this module) to "build a real model on your own data" (final modules) β without writing a single line of code along the way.
The three flavors of ML at Lincoln
Every machine learning problem on earth is one of three shapes. Once you can see the shape, you know which tool to reach for.
Regression β predicting a number. "What will the scrap rate be on Line 4 next week?"
Classification β predicting a label. "Will this part pass final QC, yes or no?"
Clustering β finding hidden groups in data. "Are there families of plating-line failures we keep seeing?"
Same techniques, three shapes. The shape of your question decides which one you need.
Why this catalogue exists
Most ML resources assume you'll write code. Most non-coders bounce off in the first hour. This catalogue assumes the opposite: you have a job, you're busy, you want to know what's actually possible β and what's actually doable for you.
The path is simple: Foundations now β Linear Models β Resampling β Trees β Deep Learning β Unsupervised. Each module sharpens your judgment. The final modules ship you to an AutoML tool (Vertex AI, H2O, SageMaker Autopilot) where you push your own data through and ship a working model.
The AI agent monitoring shops you've heard about β Arize, Helicone, LangSmith β use exactly these three flavors. Regression to predict tool-call latency. Classification to flag whether an LLM response is hallucinating. Clustering to group similar failure modes across thousands of agent runs. Manufacturing and AI agents are using the same playbook.
Pick the most painful prediction problem on your floor. "Which jobs will run over schedule?" "Which baths need refreshing this week?" "Which suppliers slip first?"
Hold that one problem in your head through the rest of this module. Every concept we hit, ask yourself: "how would I apply this to my problem?" By the capstone you'll have an answer.
Lesson 1 recap
- ML at Lincoln boils down to three shapes: regression (numbers), classification (labels), clustering (hidden groups).
- This catalogue is concept-first. By the end you'll be able to push real Lincoln data through an AutoML tool β no coding required.
- Pick one painful prediction problem at work and carry it with you through every lesson.
In your own words β how would you explain this lesson's main concept to a peer at Lincoln? Save locally; have AI review for honest critique.
The data language
n, p, X, y β just enough notation to read papers and books.
Watch this first β 70-second primer for what's below.
What the letters mean
Imagine a spreadsheet of Lincoln plating jobs. Every row is a job. Every column is something you measured about that job. There's one column you care about predicting. That's all four letters.
n β the number of rows. How many jobs are in your dataset. If you have 50 plating jobs from last quarter, n = 50.
p β the number of columns (other than the target). How many things you measured per job. Bath temperature, line speed, surface prep score, plating thickness, age of solution, operator skill, customer tier, part complexity. p = 8.
X β the whole table of inputs. Capital X = matrix. n rows by p columns.
y β the column you want to predict. Lower-case y = a single column. For each row, one value.
A row is a job. Columns are what you measured. The rightmost is what you want to predict. ML in one diagram.
Quantitative vs categorical
One more fork before you can read anything in the field. The type of y decides which family of algorithms you need.
Quantitative β y is a number. Plating yield (87%), thickness (28 microns), cycles since service (120). Use regression methods.
Categorical β y is a label. PASS or FAIL. Customer tier A, B, or C. Use classification methods.
You'll see "regression vs classification" mentioned constantly. This is what they mean.
When an AI agent shop talks about "feature space," they mean the same X matrix. Each agent run is one row; each thing they measured (latency, tokens, success-flag) is a column. When they say "target variable," they mean y. Same letters, same shape.
Open any spreadsheet on your machine β production schedule, QC log, supplier scorecard. Count the rows β that's your n. Count the columns you didn't pick yourself β that's your p. Pick the column you'd want to predict β that's your y.
Congratulations. You now have a structured ML problem. The rest is choosing the right model for it.
Lesson 2 recap
- n = rows (observations). p = columns (features). X = input matrix. y = target column.
- If y is a number, you have a regression problem. If y is a label, it's classification.
- Every ML problem reduces to: "given X, predict y." Everything else is technique.
In your own words β how would you explain n, p, X, and y to a peer at Lincoln? Save locally; have AI review for honest critique.
Y = f(X) + Ξ΅
The equation every model is solving.
Watch this first β 75-second primer for what's below.
Reducible vs irreducible error
Your model's total error has two parts.
Reducible error β improves when you pick a better model or add better features. This is the part you fight against. Most of ML practice is shrinking it.
Irreducible error β locked in by the world. Random measurement noise, things you can't observe, factors that vary day-to-day. Lincoln framing: even a perfect model can't predict scrap rate to the decimal β vibration, humidity, operator focus, micro-power-fluctuations all carry noise no spreadsheet captures.
Knowing the difference saves your sanity. When your model isn't getting better, you have to ask: am I fighting reducible error (try harder) or irreducible error (this is the floor)?
Even with the perfect model, individual outcomes scatter. That scatter is Ξ΅. It's a floor β find it, accept it, move on.
Two reasons we estimate f
Same equation, two very different goals. Knowing which goal you have changes which model you should choose.
"Just give me ΕΆ"
You don't care why the model says what it says. You just want accurate predictions. Black-box models are fine. Lincoln framing: "will this batch pass QC?" β it's fine if the model can't explain why, as long as it's right.
"Tell me which X drives Y"
You want to understand the relationship. Which lever moves the outcome? You need a clear, interpretable model. Lincoln framing: "which setup parameter has the biggest impact on yield?" β answers like this change how the floor runs.
Fit a curve through Lincoln's defect data
Drag the four amber handles up and down. Try to get your error as low as possible.
The amber dashed band is the irreducible-error floor. You can never beat it β only approach it. The closer your MSE gets to 1.0, the closer you are to the best any model could do.
When you ask "why did this AI agent fail?" you're doing inference. When you ask "which agent will succeed on this prompt?" you're doing prediction. The same X, the same Y, the same Ξ΅ β the goal changes the model.
Take the prediction problem you picked in Lesson 1. Ask yourself: do I want a number or a class (prediction), or do I want to know which lever to pull (inference)?
If your boss is going to ask "why?" β choose an inference-friendly model later (linear regression, decision tree). If they only need a number on the dashboard β anything goes.
Lesson 3 recap
- Every model is hunting for f in Y = f(X) + Ξ΅. The Ξ΅ is irreducible β accept it.
- Reducible error you can fight (better model, better features). Irreducible error is the floor.
- Two goals split the world: prediction (just be accurate) vs inference (explain the relationship).
In your own words β how would you explain Y = f(X) + Ξ΅ and the noise floor? Save locally; have AI review for honest critique.
How to find f
Parametric vs non-parametric. Flexibility vs interpretability.
Watch this first β 70-second primer for what's below.
Parametric β pick the shape, fit the numbers
Linear regression is the classic example. You assume f is a straight line. Now you only have two numbers to find: slope and intercept. Cheap, fast, easy to explain.
The tax: if the truth is curved and you assumed a line, you'll always be a little wrong. Doesn't matter how much data you throw at it. Your assumption is the ceiling.
Non-parametric β let the data lead
No assumed shape. The fit can wiggle as much as it needs to follow the data. KNN, decision trees, splines β all non-parametric.
The tax: you need a lot more data to get a stable answer. With 30 jobs you can fit a straight line confidently. With 30 jobs and a wiggly non-parametric model, you're chasing noise.
Same scatter, three different commitments. Pick what your problem calls for β and what your data can support.
The flexibility / interpretability tradeoff
Here's the rule that matters in practice: more flexible models are harder to explain.
A linear regression gives you a one-line answer: "yield rises 0.4% for every degree increase in bath temp." Your boss can defend it. A wiggly non-parametric fit gives you "the model says so." Your boss is unlikely to defend that.
Choose your model based on who's asking and what they need. Sometimes accuracy wins, sometimes explainability wins. There's no universal right answer.
Pick a shape for f
Same Lincoln bath-temperature data. Three different shapes. Click each one and watch what happens.
Linear is one straight line. Cheap, clear, but can't bend.
A logistic regression for AI agent routing is parametric β clean rules you can defend in a postmortem. A neural network is non-parametric β no assumed shape, eats data for breakfast, but good luck explaining a single decision.
When leadership asks "why is the model predicting that?" β the parametric model gives you a one-line answer. The non-parametric model gives you "because patterns." Choose based on who's listening.
Default first move: try the simplest parametric model. If it's good enough, ship it. If it's not, level up.
Lesson 4 recap
- Parametric = pick a shape (linear, polynomial), fit the numbers. Cheap, clear, can be wrong if the shape is wrong.
- Non-parametric = let the data shape itself. Flexible, hungry for data, harder to explain.
- More flexible = more accurate (sometimes) but less interpretable. The tradeoff is the whole game.
In your own words β how would you explain parametric vs non-parametric models, and the defensibility tradeoff? Save locally; have AI review for honest critique.
Did it actually work?
MSE and the train/test split. The non-negotiable pre-flight.
Watch this first β 65-second primer for what's below.
MSE β Mean Squared Error
The standard score for a regression model. For each prediction, take (prediction β actual), square it, then average across all predictions.
Why squared? So big misses count more than small ones. A prediction that's off by 10 is much worse than ten predictions off by 1. The square makes that bite.
Lower MSE = better. Period. But: which data did you measure it on? That's the whole question.
The train/test split
Hold out 20% of your data before you train. Train the model on 80%. Score it on the 20% it has never seen. That score is your honest estimate of how it'll do in the real world.
Anything else is cheating. If the model has seen the data during training, asking it to predict that data is like giving a student the answer key during the exam. They'll ace it. They'll teach you nothing.
Hide 20% before training. Score on it after. Anything else is fiction.
Overfitting β the trap
A super-flexible model can memorize the training data. It can fit every noisy point exactly. Train MSE β near zero. The model looks brilliant. You ship it.
Then production data arrives. The same model is suddenly catastrophic. Why? Because it learned the noise, not the signal. Train MSE was a lie.
The only honest metric is the one measured on data the model hasn't seen. That's the rule. Memorize it.
Drag your predictions to fit Lincoln's daily yield
Then flip the test-set switch and see what you actually built.
A tight fit on training data isn't proof of anything. Holdout is. Drag every handle right onto its cyan dot β train MSE goes to ~0. Then reveal the test set and watch what happens to test MSE.
When an AI shop says "we evaluated on a held-out set," this is what they mean. If they don't say that, ignore the accuracy number. There's a 90% chance they trained and tested on the same data β which tells you nothing.
Never trust an accuracy number that doesn't tell you what it was tested on. Always ask: "was this measured on data the model has seen, or data it hadn't seen?"
If a vendor pitches you a model with 95% accuracy and can't answer that question β they don't know what they built. Run.
Lesson 5 recap
- MSE = average squared prediction error. Lower is better β but only on the right data.
- Train/test split = hide 20%, train on 80%, score on the hidden 20%. Non-negotiable.
- Overfitting happens when a flexible model memorizes the training noise. Train MSE looks great, test MSE blows up.
In your own words β how would you explain MSE and the train/test split? Save locally; have AI review for honest critique.
Bias vs Variance
The central tradeoff. Move the slider. Feel it click.
Watch this first β 2-minute primer for what's below. The flagship.
Bias β error from being too simple
Bias is what happens when your model can't bend enough to follow the truth. You assume defect rate is a flat line, but it's actually a curve. No matter how much data you give it, the line will always be wrong in the middle. That's bias.
High bias = the model is too rigid. It misses real patterns. The error is built into the assumption.
Variance β error from being too clingy
Variance is what happens when your model bends too eagerly. It hugs every data point, including the noisy ones. Re-train on a slightly different sample and the fit changes wildly. That instability is variance.
High variance = the model is too sensitive. It's chasing noise. Predictions on new data are unstable.
Bias is a fence. Variance is a windsock. Just-right is the curve in the middle that ignores both extremes.
The U-shape
As you increase a model's flexibility, two things happen at once. Bias drops (more flexible models can follow the truth). Variance rises (more flexible models chase noise).
Total error = biasΒ² + variance + irreducible noise. Add it up across flexibility levels and you get a U-shape. The bottom of the U is the sweet spot β flexible enough to capture the pattern, not so flexible that it's chasing noise.
Every modeling decision is just trying to find the bottom of that U.
The BiasβVariance Playground
Move the slider. Watch the curve, the bias, and the variance fight each other.
The dotted vertical line on the right chart marks where you are on the U-shape. The green dashed line marks the sweet spot β the flexibility level where test MSE bottoms out. Watch test MSE drop, bottom, then climb again as variance takes over.
When an AI agent shop says "we hyperparameter-tuned for the lowest validation loss," they're standing on the U-shape, looking for the dip. Same playground, different vocabulary.
When a model performs perfectly on past data, ask one question: how does it do on a held-out month? If the gap is large, that's variance speaking. The model memorized history. Ship a simpler one.
And when leadership says "make it more accurate" β push back. Past a point, more flexibility makes accuracy worse, not better. The U-shape is real.
Lesson 6 recap
- Bias = error from too rigid a model. Misses real patterns.
- Variance = error from too flexible a model. Chases noise.
- Total error = biasΒ² + variance + irreducible noise. As flexibility grows, bias drops, variance rises.
- The sweet spot is the bottom of the U. Every modeling decision is hunting for it.
In your own words β how would you explain bias, variance, and the U-shape? Save locally; have AI review for honest critique.
KNN β your first algorithm
Classify by neighbors. The simplest serious ML there is.
Watch this first β 75-second primer for what's below.
How it works in one sentence
Given a new point, find the K closest points in your training data, take a majority vote. That's it.
Lincoln framing: a new plating job arrives with bath temp 82Β°C and line speed 14 parts/min. Look up the 10 most similar past jobs. If 7 passed and 3 failed, predict PASS. If 3 passed and 7 failed, predict FAIL. Done.
That's a real, defensible model. No coefficients. No training. The "model" is just your past data.
Picking K
K = 1 means you trust whoever's standing closest, even if they're an outlier. One weird past job dominates every prediction near it. Wiggly decision boundary. High variance.
K = 100 means you average over basically your whole dataset. Too smooth to catch local patterns. Blurry decision boundary. High bias.
The right K is somewhere in between, and yes β it's a bias-variance tradeoff. The U-shape from Lesson 6 shows up here too. Sound familiar?
The black cross is the new job. Find K nearest. Take a vote. That's the entire algorithm.
Click to add data. Slide K.
You're predicting "will this plating job pass QC?" from two setup parameters.
Background color = what KNN would predict at every point. Click anywhere to add a labeled job and watch the boundary react. Try the K=1 preset, then K=25 β same data, totally different model.
Vector databases doing semantic search are KNN at scale. "Find the K embeddings closest to my query" is literally KNN with cosine distance. Every RAG pipeline, every recommendation engine, every AI agent's memory lookup β KNN under the hood.
KNN works great for small datasets. If you have 200 plating jobs and want to predict which will fail QC, KNN is a perfectly defensible first pass. No fancy infrastructure needed.
The catch: KNN gets slow with millions of rows (it has to search everything every time). For Lincoln-scale data, that's a non-issue.
Lesson 7 recap
- KNN: find the K nearest past examples, take a majority vote. The model is the data.
- Small K = wiggly boundary, high variance. Large K = smooth boundary, high bias. Same U-shape.
- Perfect first algorithm for small Lincoln-sized datasets. Defensible, simple, surprisingly accurate.
In your own words β how would you explain KNN to someone who's never seen ML? Save locally; have AI review for honest critique.
Predictive Maintenance for Lincoln plating lines
The thing you can carry into work tomorrow.
Watch this first β 65-second primer for what's below.
The problem
50 machines on the floor. Some will fail in the next 7 days, some won't. We have 8 features per machine β age, cycles since service, vibration, bath temp drift, operator skill, part complexity, customer tier, and one decoy that looks important but isn't. Build a KNN classifier that predicts failure on a held-out test set.
What you'll do
Step 1. Pick which features go into the model. Some help. One is a trap.
Step 2. Pick K. (Remember Lesson 6 and 7 β find the U-shape sweet spot.)
Step 3. Watch test accuracy and the confusion matrix update live as you tweak.
Step 4. When you're happy, generate the spec sheet β your shippable artifact.
Why it matters
If you can flag a failure 7 days before it happens, you schedule maintenance during a planned slot instead of an emergency stop. That's literally Planning + Cost Savings + Operations all at once. It's the kind of model leadership notices.
Predict which Lincoln plating lines fail in 7 days
Pick features. Pick K. See test accuracy. Then ship a spec sheet.
| Pred Fail | Pred OK | |
|---|---|---|
| Actual Fail | β | β |
| Actual OK | β | β |
High True Positive + low False Negative = catching failures. That's the ballgame.
Aim for accuracy β₯ 0.80 on the test set. Try removing the decoy. Try different Ks. The best feature set + K combo isn't always obvious.
Every observability platform that "predicts incidents before they happen" is doing this exact loop. Pick features. Pick a model. Validate on past data. Ship. The flavor differs; the playbook doesn't.
Pull a real spreadsheet from work. Same 50ish rows, your real features, your real failed_within_7d column.
Run it through any AutoML tool β Vertex AI, H2O, SageMaker Autopilot, DataRobot β using what you learned about features and bias-variance. You can pick the right model. You can defend it. You're ready.
This is what the rest of the catalogue ships you toward.
Capstone recap β what you just did
- You picked features deliberately (and learned to spot decoys that hurt accuracy).
- You tuned K for the bias-variance sweet spot β same U-shape from Lesson 6.
- You evaluated on a held-out test set β the only honest metric.
- You generated a printable spec sheet β your shippable artifact for leadership.
In your own words β how would you explain the predictive maintenance loop you just built (pick features, pick K, score, ship)? Save locally; have AI review for honest critique.